Delete domain info from index

Hello, I’m new here and new using YaCy.

I launched the crawler in some webs. In one particular web the crawler is been stuck in a loop all day. It got an URL, and [the same URL] + [a suffix like %25252525…Redirect], then [URL] + [2 times the suffix] and again… All the URLs and URLs plus suffix link to the same page. It went all day long and I have around 18GBs of space used by YaCy.

So I have a very corrupted index I guess. I terminated the crawler and tried to remove the index of that website in “Index Administration” > Get top X domains from all URLs > Delete all (for that domain)

But this didn’t free any of the disk usage. How can I remove all the data from that site?

Sorry English is not my first language and I’m very newbie to YaCy.


1 Like

I would also like to know if there is a way to purge specific domains that have been crawled. The option in IndexBrowser_p.html to Delete Subpaths is not sufficient to entirely clear a domain. I figured out blacklisting, but only after my crawler already indexed some unwanted content.

1 Like

Although it’s not very ergonomic, I just discovered that it’s possible to “Delete by URL Matching” in http://localhost:8090/IndexDeletion_p.html

This will remove the entire domain (or whatever you specify) from the local index.

Hope this helps!