I saw in the GUI there is a button to show different YaCy interfaces, for example JSON. But I don’t understand how I could retrieve that JSON from the command line. Do I just copy the URL string I saw in my browser and use something like Python requests.get()? The URL was localhost:8 or something like that.
I need advice on parsing the JSON returned. How can we learn about its structure to extract the desired content, for example, URLs and website descriptions?
Also, what factors are relevant in YaCy’s speed? What are some ways to make it faster?
You will find plenty of examples on json parsing on the internet. Have a look here for instance: Python JSON
You may copy the api link in your web browser and you will see the returned json to check the structure for your self.
Regarding speed, it depends on your use case. I am very happy with yacy speed, however I have 1gbit internet, lots of memory, low latency/high bandwidth hard disks and I crawl domain specific websites, not the entire internet.
I was already able to retrieve the JSON as a Python list. What I meant was that this is a complex nested list and I would like to know more about its structure so I know where the entries are containing the information I am seeking. I did not see this discussed in the docs but perhaps it’s there; I’ll look again.
Could you please explain what the most important factors are in YaCy speed? Would it be possible to get the search down to 0.1 seconds? How much memory do you have? Do you specifically mean RAM or also your longterm memory and even CPU? Why would RAM be more important than the CPU, for example? Which high bandwidth hard disk do you use? And how do we set up YaCy to only crawl certain sites?
I’m curious, has anyone ever formally evaluated the quality of YaCy’s results as compared to Google?
Thanks very much. If you can point me in a good direction to these questions I look forward to researching them further.