My crawler keeps stopping every few minutes. I can then start it again, and it continues for a few minutes, to then pause again.
Why does it constantly pause? How can I keep it crawling, till it actually finishes the crawl?
My crawler keeps stopping every few minutes. I can then start it again, and it continues for a few minutes, to then pause again.
Why does it constantly pause? How can I keep it crawling, till it actually finishes the crawl?
Hi,
Have look around here: http://localhost:8090/PerformanceQueues_p.html
Check the memory and system load limits and try to compare it with the available resources.
System load can easily go above the default limit (2.0) that pauses the crawl for a while.
I tried to increase the limit there from 3.0 to 4.0. It does not seem to make much difference. Once it pauses, it won’t continue anymore. Looking at the system status right now, it shows the load as 0.18, or even below that. Yet, the crawler stays paused.
Try to take a look at the log file DATA/LOG/yacy00.log if there are any relevant messages.
It seems it has too little disk space…
I 2021/10/18 14:47:34 RESOURCE OBSERVER Volume /opt/yacy_search_server/DATA: free space (3921 MB) is low, but nominal (< 4096 MB)
I 2021/10/18 14:47:34 RESOURCE OBSERVER pausing local crawls
W 2021/10/18 14:47:34 SWITCHBOARD Crawl job '50_localcrawl' is paused: resource observer: not enough disk space, 4111728640
I 2021/10/18 14:47:34 RESOURCE OBSERVER pausing remote triggered crawls
W 2021/10/18 14:47:34 SWITCHBOARD Crawl job '62_remotetriggeredcrawl' is paused: resource observer: not enough disk space, 4111728640
It seems to use quite a bit of disk space then…
Yep.
This could be adjusted under ‘RAM/Disk Usage & Updates’
/Performance_p.html
There should be a ‘pause reason’ message on the crawler monitor page which explains what kind if Ressource limitation has paused (not stopped) the crawl.