Index. Kelondro? .. vs lucene (solr) ? which in use, why, advantage?

bci · 13 June 2023 12:15

Hi. I try to find the index.
Other SEs that i have worked on took lucene (at their core. mainly took elastic)

In the first version of Yacy (which I downloaded from github), I cannot find lucene, but the kelondro structure (a structure invented for Yacy, got it ok, including the name which is a kunstword, ok … btw what is the advantage of this structure to lucene, if I remember correctly then, lucene did exist already … nor not? thank you).

In this current version (also from github), there is lucene rsp. solr however, files of the kelondro package are still used. some are utilities, ok.

… my question: is this kelondro data store structure in use for the index, if so, why, if also lucene, why both

Thank you

okybaca · 16 June 2023 08:58

I don’t understand that either.
After several years of using YaCy, I’m still not sure, what is exactly the role of Kelondro vs solr, which data are stored where and whether both are necessary.

I noticed, @Sviatoslav even wrote scripts to delete kelondro files as unnecessary. But I’m still curious, if kelondros have some function, or if I can switch them off with no lose (and how?).

@Orbiter , could you please explain? Thanks!

bci · 22 June 2023 13:41

Hi @okybaca ty for this comment, ok!

in package net.yacy.search.query.SearchEvent,
protected SearchEvent

starting line 450 (i hope I have the current vers.)

… so both indices are searched. I am not so sure what is the difference, there are calculated word count values etc. in rwi however, they are (so i do think) available by solr, also.

… maybe it is useful to have some debug info here, what had been found in the indices. Writing some test classes could be done however maybe some effort

… maybe i will try to do so later.

 // start a local solr search
        if (!Switchboard.getSwitchboard().getConfigBool(SwitchboardConstants.DEBUG_SEARCH_LOCAL_SOLR_OFF, false)) {
            final boolean useSolrFacets = true;
            this.localsolrsearch = RemoteSearch.solrRemoteSearch(this,
                    this.query.solrQuery(this.query.contentdom, this.query.isStrictContentDom(), useSolrFacets, this.excludeintext_image), this.query.offset,
                    this.query.itemsPerPage, null /* this peer */, 0, Switchboard.urlBlacklist, useSolrFacets, true);
        }
        this.localsolroffset = this.query.offset + this.query.itemsPerPage;

        // start a local RWI search concurrently
        this.rwiProcess = null;
        if (query.getSegment().connectedRWI() && !Switchboard.getSwitchboard().getConfigBool(SwitchboardConstants.DEBUG_SEARCH_LOCAL_DHT_OFF, false)) {
            // we start the local search only if this peer is doing a remote search or when it is doing a local search and the peer is old
            this.rwiProcess = new RWIProcess(this.localsolrsearch);
            this.rwiProcess.start();
        }

Orbiter · 2 September 2023 08:07

We have both indexes in YaCy, Lucene/Solr and kelondro. The kelondro index is used only when YaCy operates in p2p mode and it manages the fragments that are exchanged in the p2p transmissions. Think of it as a “dissolvable” index: when p2p transmissions are done, fragments of the index are taken out and transmitted to other peers.
A search in the p2p index is done on both, solr/lucene (locally) and kelondro (locally and remote). Searching the local kelondro index will access those parts that have been transmitted by other peers to your own peer.
So having two indexes (where one “dissolves” over time) ensures that you always own our own data (what you crawled, in the solr/lucene index) while having a distributed access to the p2p index (from what has been moved to your peer).

okybaca · 8 September 2023 06:04

Thanks for the explanation, this is finally clear! I’d include that in FAQ.