Hi , can you force Yacy to access a URL database of 25,000 websites that have been currated?
Easy, just convert those 25000 Websites to a text-based URL list, one URL per line, and paste that list inside the Crawl Start url-window. I just tried this some weeks ago.
It will take about one hour until YaCy has ingested that huge bunch of URLs in the Crawl Start, but it works.
You can adjust the crawl start with some proper or crazy settings, like (proper) craw depth = 0 to only index the given urls. Or differently if you want to have a full crawl of any depth for each of thos 25000 URLs.
1 Like
Let us know how long it takes. I am interested.
Do you know if you can push into the URL WINDOW from a database and update the pasted crawl list. IE remove and add??
Thank you