How to use YaCy API

julkhami · 6 August 2021 00:56

I saw in the GUI there is a button to show different YaCy interfaces, for example JSON. But I don’t understand how I could retrieve that JSON from the command line. Do I just copy the URL string I saw in my browser and use something like Python requests.get()? The URL was localhost:8 or something like that.

Thanks very much.

ian · 6 August 2021 06:56

Yes, most of the API is described here
https://wiki.yacy.net/index.php/Dev:API#API_reference

You may google “python rest api” to see examples on how to call such kind of APIs from Python

julkhami · 7 August 2021 13:59

Thanks so much.

I need advice on parsing the JSON returned. How can we learn about its structure to extract the desired content, for example, URLs and website descriptions?

Also, what factors are relevant in YaCy’s speed? What are some ways to make it faster?

Thanks very much.

ian · 8 August 2021 08:07

You will find plenty of examples on json parsing on the internet. Have a look here for instance: Python JSON

You may copy the api link in your web browser and you will see the returned json to check the structure for your self.

Regarding speed, it depends on your use case. I am very happy with yacy speed, however I have 1gbit internet, lots of memory, low latency/high bandwidth hard disks and I crawl domain specific websites, not the entire internet.

julkhami · 9 August 2021 10:52

Thanks very much.

I was already able to retrieve the JSON as a Python list. What I meant was that this is a complex nested list and I would like to know more about its structure so I know where the entries are containing the information I am seeking. I did not see this discussed in the docs but perhaps it’s there; I’ll look again.

Could you please explain what the most important factors are in YaCy speed? Would it be possible to get the search down to 0.1 seconds? How much memory do you have? Do you specifically mean RAM or also your longterm memory and even CPU? Why would RAM be more important than the CPU, for example? Which high bandwidth hard disk do you use? And how do we set up YaCy to only crawl certain sites?

I’m curious, has anyone ever formally evaluated the quality of YaCy’s results as compared to Google?

Thanks very much. If you can point me in a good direction to these questions I look forward to researching them further.

Best regards.

Orbiter · 9 August 2021 18:21

The old wiki as plenty of API pages, see: Kategorie:API – YaCyWiki

The most important ones are:
search: Dev:APIyacysearch – YaCyWiki
crawl: Dev:APICrawler – YaCyWiki

Yes the wiki is broken, but can still be read.
It would be great to move over the documentation to yacy_net_homepage/docs/api at master · yacy/yacy_net_homepage · GitHub
every help is welcome!