How does YaCy work?

swetepete · 3 November 2022 15:47

Hey,

I have read some of the FAQ and will read some more but it doesn’t directly answer some questions I have and some of it is on the other hand a little over my head. I’d really appreciate someone helping me get the very basics so I can dive in a bit deeper.

I understand YaCy is a P2P (peer-to-peer) search engine.

There are many fundamentals about YaCy which I don’t have a clear grasp on.

When you install the YaCy server to contribute to the index, how much computing resources does it consume, in terms of storage, CPU, and bandwidth?

How does YaCy crawl and index the web? Is each server running a small crawl operation? How is it determined which sites each server will crawl?

What is the format of an entry in the YaCy web index? Apart from standard metadata such as page title and description, does YaCy have purpose-built methods for identifying keywords or storing page content as well?

What kind of search algorithm does YaCy use over its index? Is it a common keyword search? Does YaCy have anything akin to Google’s PageRank?

Thanks very much

Sviatoslav · 21 December 2022 11:37

Требует много ресурса. YaCy написана на Яве, а Ява потребит всё, сколько ей дадут.
Запускаться может на 600Mb RAM, но это только запуск. Чтобы индексировать, нужно не меньше 800Mb, и это минимум. На таком ресурсе YaCy может обработать индекс не более 8Gb (это приблизительно 200 тыс. странц), и будет часто останавливаться из-за исчерпания памяти.
На 2Gb RAM работает более стабильно, а для полной свободной работы нужно 16Gb RAM, как я читал здесь на форуме (сам не испытывал, не было такого ресурса. У меня узел маленький.)

Вы в ручную задаете, какие сайты индексировать (можно сразу списком).

Извините, ответил только на то, что знаю. Сам блуждаю наощупь. Никакой подробной информации нету.