There are no crawls running. What could be making YaCy use so much of the CPU for so long (hours and hours)?
If the computer is shut down and YaCy restarted, the same situation occurs again.
A wild guess is that it is "Postprocessing Progress
Does anybody have any suggestions?
running on windows? Most likely it’s the windows updater.
If not, start w/o YaCy. If sure its’YaCy - look into yacy00.txt
I have had the same issue with YaCy. (java ~ 500%). I am running it on Fedora linux. Do you think it is just building the index? Do you recommend that I let it continue or do you think there is something wrong?
UPDATE: I let it run for 30-40 minutes and the server is unresponsive. It returns the error GC overhead limit exceeded. Maybe I need to adjust the memory limits?
If this is caused by the postprocessing then that is working as designed - the postprocess calculates the pagerank which is computational madness.
Therefore the postprocessing is disabled in recent releases. If it is not disabled in yours, please do so.
If no crawl is running, postprocessing is off and still CPU is high then we should investigate this further.
May I ask where is located the postprocessing option? I been looking for it but can’t found it.
its really hidden - you must switch on a specific index field (“process_sxt”) in the index schema which you can find here: http://localhost:8090/IndexSchema_p.html
Then freshly crawled content can be processed - but postprocessing starts only after the complete crawl has finished and the crawl stack is completely empty. The postprocessing does not start instantly but only if the cleanup job runs - which runs every 10 minutes.
Another condition is, that the Web Structure Index is switched on which you can find at http://localhost:8090/IndexFederated_p.html - but that should be on by default.