I’m running YaCy in a few modes, public and intranet for a year or so. It’s been great as an indexing/search server for indexing public content.
I’d like to have YaCy index inside documents, like DOCX, PDF, JPG, etc, basically everything Apache Tika does.
Is there a way to include Apache Tika for unknown file types?
I see lots of errors like
Podcast.csv' file extension is not supported and indexing of linked non-parsable documents is disabled.
Where the content of the csv is pure text, and it should be easy to parse/index/categorize and make searchable.
I experimented with the Open Semantic Search project, https://www.opensemanticsearch.org/, but it’s not quite the same.