Hello, I’ve read about the “Story of YaCy Grid” before and I have found this interesting:
incompleteness: we distribute our search index to a set of remote peers and we don’t have any control over the lifetime of the storage of that index. This causes a bad recall.
I was running as two Senior peers under the names “reinhart1010-scavenger” (that Raspberry Pi and OpenWrt device) and “reinhart1010-hub” (on another spare Linux laptop) to help crawl some undiscovered websites such as gojek.com and vidio.com, and I also found this statement to be true as well.
After most of the crawling process is done, I didn’t see any significant change on the YaCy search result in other peers, even between my own peers located on the same, local network! And since the IndexNow protocol has been introduced to let websites signal search engines for updated content, I think maybe we can use it to improve the current quality of YaCy P2P search results.
My idea here is to provide a single, persistent endpoint to respond to IndexNow requests. When a website sends an IndexNow request, the server converts that into a bunch of crawl requests (let’s say, contacting 200 peers over the course of 48 hours for a single URL) to the P2P network, hoping that they will be able to crawl and index for their own and nearby peers.
I believe this idea could be feasible, however that means we might need to create a separate server implementation (for the IndexNow endpoint) which is still compatible with the existing YaCy network. And also, peer distribution for crawling requests could be a challenge, too.