this post was submitted on 31 Jul 2023

29 points (100.0% liked)

Selfhosted

50397 readers

378 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

Selfhosting a proxy at home? (lemmy.neoney.dev)

submitted 2 years ago by neoney@lemmy.neoney.dev to c/selfhosted@lemmy.world

17 comments fedilink hide all child comments

Hello! I have a server that runs 24/7, and have recently started doing some stuff that requires scraping the web. The websites are detecting the server’s IP to not be residential though, and it’s causing issues.

I’d like to host a proxy server on the small server I have running 24/7 in my house, so that everything for that 1 page could be proxied through it. Does anyone have any idea how I’d set up a server like that? Thanks.

all 19 comments

sorted by: hot top controversial new old

[–] Anafroj@sh.itjust.works 9 points 2 years ago* (last edited 2 years ago) (1 children)

Max-P already provided good options, but I have to ask what I, and probably other people, wonder : why don't you just run that scrapping program from your home server, then?

[–] neoney@lemmy.neoney.dev 4 points 2 years ago (1 children)

The scraping program saves large files which I don’t have space for on the tiny raspi

[–] Anafroj@sh.itjust.works 3 points 2 years ago (1 children)

(sorry for the double post, the instance I'm on was throwing errors)

Gotcha, thanks for satisfying my curiosity. :) Of course, you can plug a usb drive on the Pi, but you know better what your needs are. Good luck!

[–] neoney@lemmy.neoney.dev 2 points 2 years ago (1 children)

I’m honestly planning to stop using the Pi today, it’s been unstable and I don’t like Raspbian, but I decided it’s not worth it to reinstall after getting 3 corrupted SD cards and just bought a used thin client which will replace it.

[–] Anafroj@sh.itjust.works 1 points 2 years ago* (last edited 2 years ago) (2 children)

I feel you, been there. :) I now use Gentoo on my Pi and it is stable, but I can't recommend that to anyone who is not already used to Gentoo, it's challenging to install it by itself.

Regarding the SD card, I have no problem anymore since I stopped using the cheapest brands. I now use only Sandisk Ultra microSDXC, and the oldest ones have been working for four years without issue. It's still basically a NAND (same stuff than in SSD drives) soldered on pins, though, so it's very fragile. Care should be taken to neve bend them : they look flexible, but the NAND really isn't.

It's also a good idea to backup the whole card. As they usually weight way less than hard drives, it's easy to backup on your system and flash them back, mounting the sdcard on your desktop/laptop:

lsblk # find the device name, let's say it's mmcblk1 
dd if=/dev/mmcblk1 of=./backup-file bs=1G  # making a backup
dd if=./backup-file of=/dev/mmcblk1 bs=1G  # restoring the backup

if means "input file", of means "output file" and bs is the buffer size (how many bytes are copied at once, the more the faster, but it will use that amount or RAM at each iteration). dd is just copying input to output, bs bytes by bs bytes.

If you do that regularly, even using cheap sdcard that fail after a year will be less of a setback : you can just flash the last saved version of the system on a new card. It's probably better, though, to keep only the OS on the sdcard, and store important daily updated data on a usb drive or key.

[–] neoney@lemmy.neoney.dev 1 points 2 years ago

I have a better solution for the problem now. NixOS

[–] karlthemailman@sh.itjust.works 6 points 2 years ago (1 children)

How much are you scraping? You may end up getting your home up blocked.

[–] neoney@lemmy.neoney.dev 2 points 2 years ago

Nah, it’s not much. Maybe 10 pages a day.

[–] Max_P@lemmy.max-p.me 5 points 2 years ago (1 children)

You can pretty easily install Squid, it's fairly simple to configure and works well for most use cases. Just a plain simple HTTP proxy.

You could also set up a VPN to your home to achieve something similar, by binding some requests to the VPN IP. It's a bit harder to set up however as it involves routing tables, route metrics and conditionally binding the outgoing connection to a specific interface

[–] neoney@lemmy.neoney.dev 2 points 2 years ago (2 children)

Thanks, sounds like Squid will be perfect. I’ll just need to figure out some way to connect. I wish I could just open a port, but it hasn’t been working since I enabled IPv6 on my router. Do you think I could make it accessible through cloudflare tunnels?

[–] InverseParallax@lemmy.world 5 points 2 years ago

Probably not, look into wireguard or tailscale.

[–] Max_P@lemmy.max-p.me 1 points 2 years ago (1 children)

Cloudflare tunnels won't work as Cloudflare won't tunnel HTTP proxy traffic, at least as far as I know.

What you can do however is have your home server VPN into your remote server, then your remote server will have no problem connecting to Squid over the VPN link. WireGuard is very simple to configure like that, probably 5-10 lines of config on each end. You don't need any routing or forwarding or anything, just a plain VPN with 2 peers that can ping eachother, so no ip_forward or iptables -j MASQUERADE needed or anything that most guides would include. You can also use something like Tailscale, anything that will let the two machines talk to eachother.

Depending on the performance and reliability needs, you could even just forward a port with SSH. Connect to your remote server from the home server with something like ssh -N -R localhost:8088:localhost:8080 $remoteServer and port 8088 on the remote will forward to port 8080 on the home server as long as that SSH connection is up. -N simply makes SSH not open a shell on the remote, dedicating the SSH session to the forwarding. Nice and easy, especially for prototyping.

[–] neoney@lemmy.neoney.dev 1 points 2 years ago (1 children)

That seems overcomplicated for me honestly, but now I just thought that I actually can host the scraper on the home server, as the scraper itself only scrapes simple data, and the downloads are by a separate program.

[–] neoney@lemmy.neoney.dev 1 points 2 years ago

The downloader talks to the scraper through HTTP, which I can publish through CF Tunnels, so it’s perfect.

[–] Illecors@lemmy.cafe 4 points 2 years ago* (last edited 2 years ago) (1 children)

Besides other answers - you could just use SSH port forwarding. Remote would be your home server, local would be your "cloud" server. You should initiate the connection from the cloud server to your home server. Playing with local ports would enable you filter what domains are used for the proxy.

I rarely use it, so the exact syntax is gone from my memory. It is a bit tricky at first, but definitely not rocket science to figure out.

Once the connection is established - you would point your scraper to https://localhost:localport

[–] lynny@lemmy.world 1 points 2 years ago

This is the most simple solution and probably a lot safer than the alternatives. Another good option would be to use OpenVPN.

[–] neoney@lemmy.neoney.dev 1 points 2 years ago

I ended up hosting the scraper service on my home server and exposing it through Cloudflare Tunnels, as the service is pretty much just an API that doesn’t really do much data downloads