YaCy on a flash drive

Yes, if someone has some clue what they are doing. If though, (like me, unfortunately) they have never done much if any computing outside an MS Windows graphical interface, there is not much hope.

I’ve probably got a dozen or so “small Linux” systems up and running, using some Windows utility (Rufus, Easy2Boot, etc.) only to find myself staring at a blinking cursor, not knowing what to do next.

The main issue, installing from windows seems to be, if “successful” the result is a USB with an underlying FAT32 file format, which is limited to 2, or at best, maybe 4 Gig. With that, YaCy, with the default settings, will quickly start complaining about “low disk space” and shut everything down, even with 100+ Gigs of free space on the USB.

Apparently ext4 (Linux file system) will allow utilization of the entire USB drive, but so far I have not found any means of formatting a USB to ext4 directly from Windows.

Also, any Older 32 Bit hardware is similarly limited in terms of drive space utilization, apparently.

As I write this, I have just a few minutes ago, reading this: Tiny Core Linux on a USB stick | MiViLiSNet a windows utility: Sourceforge Download link: core2usb download | SourceForge.net which apparently CAN format a USB to ext4 (and other Linux formats) from Windows.

Tiny Core Plus sounds like it might be something I would be more comfortable with.

Alternatively I’m also looking into booting up some Linux system from an Easy2Boot multi-boot drive then following the instructions on the page you linked to previously: Create a Bootable USB - Alpine Linux

MX Linux may not be the way to go because it itself, apparently, cannot utilize an entire large USB for persistent storage.

Slax is another one that looks promising, IF I can figure out a way to install it on a USB formatted to ext4. So far I’ve been able to boot Slax from Easy2Boot but without persistence.

“Unfortunately FAT is limited to 4GB file size; for that reason, persistent changes can’t grow more. In case you need to save more, please format your storage drive with some Linux filesystem such as EXT4 or BTRFS and install Slax to it. Slax will be able to save changes natively and will be limited only by the actual capacity of your device.”

There are currently many USB drives in the 1 or 2 terabyte class available, but as yet I’ve not managed to cobble together anything that would allow a Linux+Yacy USB to utilize more than a few Gigs.

I’m diligently hammering away at it though as I’m able to find the time. Recently however, that’s been difficult as we’ve had a blizzard of snow here. I’m having to keep sidewalks shoveled, the temperature has dropped and pipes froze, so today I’m a plumber working on fixing broken pipes and praying the power doesn’t go out as then I’ll need to chop wood and build a fire as well.

Thanks for the encouragement and suggestions!

I’m going to have to take the plunge and start doing things from Linux itself on the command line but the main problem there is all the laptops I have that I can boot Linux on are mostly junk with malfunctioning keyboards, so I end up frustrated, banging on some key that is stuck.

I need to get a new actually NEW computer sooner or later I think. I really like the idea of utilizing old hardware, but for initially hacking this out, I need to at least have a computer with a functional keyboard.

Well, Tiny Core - the installer, it seems has not been updated since 2012 and It didn’t seem to work.

I have found just now that the system “AntiX” that MX Linux is based on has a “core” version that is 450 Mb as well as a “Base” version at 730 Mb (approximately).

From this page:
https://download.tuxfamily.org/antix/docs-antiX-13/FAQ/persistence.html
and
https://download.tuxfamily.org/antix/docs-antiX-17/FAQ/persistence.html
and
https://download.tuxfamily.org/antix/docs-antiX-19/FAQ/persistence.html

there are some important details, that may or may not apply to the “core” but to AntiX in general at least.

AntiX (And MX Linux as well) have various ways of setting up a few different types of persistent storage in a few different ways. I’m just noticing now that some of those ways could be better or worse in terms of running YaCy, and there have been some changes over the years.

Home Persistence

Home persistence is the simplest and safest. (,) Home persistence is also ideal if you want to download and save a lot of data. It is the safest precisely because you can’t save any system changes with it. Even if your system gets compromised, it will be very difficult for the bad guys to make any permanent changes to your system.

The homefs and rootfs persistence files can be created when booting at the live desktop menu. Press F5, choose the persist option and on boot you will be prompted to set it up. You will be first asked where you want to save the persistent file(s). You just need to decide how large you want to make the file(s). The ext4 file system will be used. Once the homefs and rootfs files have been created, you will be prompted to make changes to the root and user passwords. This is for security.

As I’ve always used MX Linux in “Live” mode, for the most part, with the default password, I was not aware of some of these options and did not have them configured optimally.

An older version might actually be better, though I’m not certain, but the newest FAQ for the latest version states:

“Unlike previous versions of antiX, there is no longer an option to use a entire partition for root or home persistence

That seems like it could be a worthwhile feature. If YaCy was in HOME with an ext4 file system and the partition was the bulk of a good size flash drive and root did not have persistence then the operating system would be isolated and less vulnerable, while giving YaCy plenty of room to breathe.

Static Root Persistence

Static root persistence is another way to use persistence. Static root persistence saves file system changes directly in the rootfs file. It does not use any RAM so the only size limit is the size of the rootfs file.

Another option to try, which I had not previously been aware of.

I actually liked the core version of Antix. The most informative and dummy friendly text/command line I’ve ever seen. If you try to type a command that makes no sense it offers hints and suggestions. Unfortunately the newest version does not support my WiFi card, but I tried the latest Full version and that didn’t work either on my older laptop, so I could not get online to download YaCy, but the old Full version from 2017 worked wonderfully! Got online almost instantly, no problem, so I’m going back to the Sourceforge archives to download a 2017 core. https://sourceforge.net/projects/antix-linux/files/old/

Incidentally, I’ve written programs in perl, (since about 1998 or so?) But on my own.

I have hardly any idea what Docker is, - a collaborative development platform of some sort.

I’ve generally had no real need for that sort of thing as my programs in Perl are often just a few lines or at most a few pages of code.

Not so complicated as to need a platform for version tracking and such.

Many of my programs were originally written out with pencil and paper, corrections being made with an eraser, before being typed into word pad. Then copy/pasted to a web hosting server online somewhere for testing.

Anyway, Docker, now that I’m looking into it, looks interesting. I notice it’s free, apparently, for open source projects.

Java, and “containers” and “virtual machines” is like a whole different world for a guy who has always scratched out code on a literal note pad.

For previous versions, I just look in one of my old notebooks, on the book shelf.

These aren’t criticisms. I’m just trying to open a window into how little it registers in my brain when you mention “Docker images”.

Is a “Docker image” something like an ISO image?

I have, very gradually, been trying to work my way through this:

http://www.linuxfromscratch.org/lfs/view/stable/

Mostly as a read, but haven’t gotten very far with reading and haven’t begun as far as practical application. Finding time for such things is the main problem. Also trying to work through several Java books, but finding the time to study is the most difficult part.

I highly recommend to look into docker. I will certainly go into the direction to make add-on functionality to YaCy by adding connectors to min.io and graylog.org, both should be installed with docker. I will also add convenient start scripts to run helper software in docker as I did for other projects like susi.ai - this one: start an etherpad in docker https://github.com/fossasia/susi_server/blob/development/bin/start_etherpad_docker.sh

1 Like

Docker is elegant if you wanna deploy several identical production instances of packaged software, e.g. for load-balancing or multi-tier setups, but you have to prepare the docker files first.

1 Like

I’ll definitely look into Docker. Actually I have, but only cursory.

Right now what I found interesting is that I booted up the Kiosk from USB (YaCy on Porteus Kiosk - Live USB) and Yacy posted this warning:

Access is unrestricted from localhost (this includes administration features). Please check the accounts configuration page to ensure that the settings match the security level you need.

OK, well, localhost is not accessible except from the local hardware right? So I thought anyway. Not really a worry. YaCy was online before I could set an admin password or anything. But still. Also, booted right into full senior mode.

To get to the point though: In admin under System Status there is:

Protection

password-protected
Unrestricted access from localhost. [Configure]

Address

Host: :8090
Public Address: http://45.46.121.27:8090
YaCy Address: http://agent-tatesak-ufe-150.yacy

Proxy

Transparent off URL off

Remote: not used

I typed in the address: http://45.46.121.27:8090 into the browser on my smart phone. and the YaCy Kiosk search interface is live and accessible. Absolutely. Admin area, everything. at that address.

I went into the other room and interrupted my girlfriend at her desk on the computer and asked her to type in the address.

Same thing. The YaCy search page came right up immediately, got right into the admin as well.

I think this is rather interesting.

There is no hard drive on this computer and, I believe the ISO is read only, so, there really isn’'t much permanent damage that could be done.

I would appreciate it though if anyone reading this within, say, the next hour or so would visit the URL - Kiosk at that address.

http://45.46.121.27:8090

(I’d like to know if it is really accessible from the outside or just on my home network, inside the router firewall.

If you get to the YaCy search page please let me know by posting a response here.

Also, is this normal?

Previously, running YaCy, I never worried about setting an admin password as I did not think localhost was accessible without direct access to the hardware.

YaCy, I believe, has some sort of rudimentary built in http server does it not? where a website can be stored or hosted? I recall stumbling across that at some point while browsing around in the file system.

Hmmm… in the remote proxy there is an address and port # set.

Use remote proxy for https is checked.

remote proxy host: 192.168.2.2
remote proy port: 4239

Is this normally set by default? I did not enter these settings.

edit, that appears to be my home router address(?)

why port 4239?

edit: > YaCy reports:

[ok!] Your peer can be reached by other peers

Peer Port:

with SSL (https enabled)

Configure your router for YaCy using UPnP:

![warning] Configuration was not successful. This may take a moment.

Tentatively answering my own question, Shutting off WiFi and switching to data (getting an ‘outside’ cell tower connection with my phone, the YaCy Kiosk was NOT accessable, via that URL.

Further, my home router has two different connection modes normal and ‘5G’.

Turning WiFi back on, my phone connected to 5G while the Kiosk was on the other 3G(?) Or whatever it is regular connection. Still not accessable.

I had to disconnect and reconnect on the same router mode to get phone access to the YaCy-Porteus Kiosk again.

Actually that is not an undesirable result.

It seems, probably, people in a coffee shop or whatever with WiFi access, through a router, with the Kiosk on the same router, could have access to YaCy on their phones through the WiFi.

That was, actually, the result I was looking for.

It is nice to have YaCy on my phone, at least while home, throughout the house, without having to sit in front of a computer. Or, any other phone or device connected to the router.

I’ve just closed the lid on the Kiosk/laptop and leave it running for access on any device, throughout the house. Convenient, and apparently secure.

As this is your external IP and you have port 8090 open and routed it to your YaCy installation … it works!

But: This sounds to me that you really plan to run a YaCy instance on each tablet and make it publicly available through the café’s internal firewall. How would you distinguish all the tablets? You would need to manage all the port configurations on the firewall - which I think is a no-go.

You can run many YaCy instances internally, but only one of them can share the index P2P as you have only one port 8090.

The usual way today is, that whenever you connect your mobile phone to a local wifi, you get a landing page where you must register first. You could have a link to your local YaCy engine(s), or you could redircect google.com to your local search engine by using a local unbound DNS.

To achieve this you have to maintain every local site’s WiFi implementation separately, which is probably not what you want.

So… by “It works!” are you saying, you tried (followed the link: http://45.46.121.27:8090 and opened my YaCy search page?

I think that is the purpose of the Porteus Kiosk server. Each kiosk is a “client” and the “kiosk server” is the management console that all the clients are connected to the server via VPN and,… what is it called? Ill have to look it up SSH.

Each Kiosk is configured with its own ID# and can be logged onto and managed through the remote server.

There would only be, probably 1 kiosk in each location, though I don’t see why there couldn’t be more. DHCP would assign a different IP to each, I think.

If there were more than one in any one location, then people could log onto either, (or not?), I don’t really know. This is all experimental, but, here at home, I’m able to log on with my phone, other phones, my other laptop etc. simultaneously.

Suppose I boot up two kiosks here at home and see what happens? I’m guessing a network conflict warning.

I’m not even sure how you could have accessed MY YaCy page, as I just NOW set up (or tried to set up) persistence and burned a new kiosk ISO with persistence, then booted up, and out of curiosity, checked (logged onto) my router, and port forwarding was NOT activated and there was no Upnp support (not that I could find anyway). Mystery.

My router also uses IPv6 and there is also the MAC address, I’m not sure what Porteus uses to distinguish each kiosk across the internet from a central control station.

Somehow YaCy running within Porteus has opened up a wormhole of some sort. who knows.

It came as a surprise to me that I could access YaCy, which was running on a kiosk, (live boot Porteus kiosk+Yacy on a USB) from any other device in the house (without even attempting to configure any kind of home network). That YOU (or anyone in the world) could access it, if that is what you are saying, from where? The other side of the world somewhere, seems to boarder on the impossible.

Anyway, perhaps each kiosk (in one location) could use a different port with the routers port forwarding. Port forwarding behind a router is machine specific I believe these days.

I’m rambling now, not really knowing what I’m talking about.

My intention (originally at least) was really NOT to have kiosks accessible from anywhere beyond the local WiFi connection, or router, in the Cafe’ or wherever location. I thought that would be really difficult to set up.

However it does not seem the Kiosk is actually being accessed.

On another device, the web browser does not have the same bookmarks and tabs I have set up on the kiosk machine itself.

I would be interested to know, if I set up a Kiosk with Black or white-listing if your browser is also so limited when accessing the kiosk-YaCy remotely, or can you access my tabs and bookmarks on the kiosk, or is your browser functioning independently.

The kiosk is running a “locked down” firefox browser, though the degree of limitation or freedom is whatever the kiosk is set up for. but other devices are apparently using their own browser with whatever personal settings.

My phone is running chrome. and has its own native bookmarks, tabs, etc.

So it seems what is being accessed is yacy not the kiosk.

At least I hope you were not able to access my open tabs, web-mail, bank-account, or whatever open tabs on the kiosk browser which I’m posting/typing from now.

Anyway, all very interesting.

The kiosk can actually be set up to boot into ANY page.

I set the kiosk up to boot with YaCy (localhost:8090) as the start and home page for the kiosk browser. What if I configure it to boot up into some other start/home splash page?

Time to find out I guess.

@Tom_Booth Just in case you’re not doing so already…

Instead of writing/reading everything yacy does to the livestick’s storage space; Perhaps a ramdisk/tmpfs (+ some nifty symlinking at boottime, of certain things perhaps not needing to be persistant) could be a beneficial thing? For both speed of operation, not to mention towards longevity of the flashstick?

And having (at least a bit of high-priority, if not all of it) swap at a zram might also be beneficial, instead of swapping to the flash:

Though this of course hinges on the amount of ram available on the thingy the stick get’s stuck into :smile:

Thanks.

I had YaCy running all night and day today. Then I did some additional changes and I guess I did too much, but either YaCy or Firefox or the whole system froze.

My understanding is that Porteus does not write anything to disk until rebooted. I was not able to reboot, shutdown normally or do anything. Finally after a long wait, hoping something would free up on its own, I ended up just leaning on the power button to force a shutdown. Bad idea.

The kiosk rebooted but YaCy did not load.

I have been concerned about burning through flash drives, but I’m not sure with this setup YaCy is writing to the drive, or is it using RAM, then everything gets written to the drive at shutdown.

Unfortunately, I was not able to shut down properly. There is a warning in the documentation to the effect that because persistent memory is stored at shutdown, an improper shutdown or crash can trash the persistent storage.

Poteus does have a setting for zram, which I’ve tried using, and not using. I’m not even sure if I had it turned on this time around or not.

Every failure is a learning experience I suppose.

I should not have had a dozen browser windows open doing edits on a live system, css tutorials, color pickers, websites, search engines going, three or more boxes/computers/phones accessing this thing and who knows how many others remotely all emanating from a cheap USB drive.

But, all in all, I’m very pleased with the progress of this project. But now I have to reformat and reburn the ISO, unless there is some way to restart yacy from the command line or something somehow from within porteus.

I stuck the drive into a porteus (not kiosk) system (also running live) then inserted the drive. It was mounted (I think) fine and I was able to brows the files, but I have no real idea where to look for anything or what to do, but generally, there was no obvious or apparent damage, such as warnings about not being able to read a file.

The kiosk boots, but YaCy does not start. I wish I could somehow rescue all my work, but I’m afraid it was running in ram and was lost.

Im actually posting this from the running kiosk NOW, as everything else is working fine, but the YaCy/localhost browser tab just says “Problem loading page”.

Anyway, I WAS at least able to confirm, to myself, that the Kiosk was accessable through my phone via the internet, outside of my home network.

I turned off WiFi on the phone and turned on Data access, which I don’t use because it is expensive/slow and limited in bandwidth, compared with the cable wifi router, but, it did work! Which is interesting, given I haven’t set up any dynamic dns or anything.

Last time I tried Alpine Linux, I was looking for something to use on a very old 32bit system laptop and scrounged around for an old archived version of Alpine. The computer itself had no WiFi card that did not go well.

The computer I’m working with now is a not too shabby Acer “Gaming” computer, from whenever i3 was considered cutting edge, maybe 10 years ago. supports 3.0 USB. so, now I can use the latest version of Alpine, which should be less problematic.

Downloaded, and, hopefully getting ready to boot it up now.

EDIT:

I keep running into the same problem with Alpine. Virtually every installation video includes doing the following:

vi /etc/apk/repositories

Then the instructions are to un-comment the “community” repository. (delete the “#”)

I get to that point, then I get “stuck”. The cursor will not move down to the bottom of the page, there is no command prompt, the escape key does nothing. I’ve tried everything I can think of to get out of or off that screen. Hitting return does nothing. And none of the video tutorials explain how they get out of there and back to a command prompt.

Maybe this repositories “window” or whatever it is, is supposed to close automatically? I’ve tried just waiting, but nothing ever happens.

The “official” text instructions similarly do not explain how to exit that window.

Using community repositories of my release version

The community repository was introduced with Alpine Linux version 3.3. To enable the repository, edit the file /etc/apk/repositories using an editor (nano for instance) and add (or uncomment) a line that points to the “community” directory, formatted as in:

https://<mirror-server>/alpine/<version>/community

After enabling the community repository, one needs to obtain the latest index of available packages with:

apk update

https://wiki.alpinelinux.org/wiki/Enable_Community_Repository

Anyway, until I find some way to proceed to the next step, which would be:

apk update

I’m settling for using SLAX.

Though, as well as that has been going, in my haste to install SLAX on a flash drive, I neglected to re-format the drive for “unlimited” persistent storage. And as the drive was formated with FAT 32, It is as though the 128 Gig drive is just 4 Gig, which for running Java/YaCy is inadequate if connected to the global YaCy network for any length of time.

slax_on_Fat32

That is what a 128Gig drive looks like after installing SLAX, if the drive is Fat32 formatted.

Something called “Disk Genius” appears to be able to format to Ext4 from Windows.

I have one last new 256 Gig drive for the attempt.

The SLAX installation guide for installing on a USB stick states:

Unfortunately FAT is limited to 4GB file size; for that reason, persistent changes can’t grow more. In case you need to save more, please format your storage drive with some Linux filesystem such as EXT4 or BTRFS and install Slax to it. Slax will be able to save changes natively and will be limited only by the actual capacity of your device.

Sounds good, but I have yet to get it to work.

The problem mainly being a kind of catch 22 where if formatted to EXT4 windows cannot read the disk to copy the files over. To copy the files over windows needs a file format it can read. My laptop booting linux from a usb does not have enough usb ports to plug in to run the OS while making the transfer. Maybe I can boot SLAX to RAM only, then remove the bood disk then download a fresh SLAX ISO to burn to a new USB stick.

Sorry, just “thinking out loud”.

SLAX + YaCy on a USB worked wonderfully for a few days, with no problems, persistent memory worked great, until the “memory” usage (Mostly for “Java” according to the SLAX “Task Manager”) began to exceed 4 Gig. then simply performing a search would cause the system to freeze and CPU use by “Java” would be nearly pinned at 90% or so.

EDIT:

I’m keeping the original SLAX setup for now, as I put quite a lot of work into it, but I’m only using it in Local Portal mode for a limited number of local websites. If I leave it connected to the YaCy world-wide peer network, with only 4 Gigs of “disk space” it gets bogged down rather quickly (if left running overnight, for example).

In Portal mode, everything, including the YaCy web server with the custom pages, Online Kiosk search page, access from mobile, and all still works great, and consumes almost zero resources. I have all up and running again now, and this is the report from Task Manager after just running several crawls, searches etc. to confirm it is all up and running. Also set my phone on Data and turned off WiFi to confirm the Online Kiosk is actually online and accessable:

There is an enormous difference in resource consumption between Portal mode and Community. For my purposes, portal mode is ideal, as I want to limit the Local Information Center Kiosk thing to indexing websites in the immediate central New York State region of the USA.

Also I want to save this setup so that when I get FULL persistence activated on another drive, I may be able to copy some files over to the new disk without having to start over from scratch.

In order to make this work, without YaCy shutting down crawling and DHT the memory resource settings need to be kept within reasonable limits for a system with just 4 Gigs available.

These are my settings, which seem to be working nicely currently: I don’t know if these are the best possible settings, but they seem to work for me, given the constraints imposed on a flash drive with a Fat32 file system.