Background
My batchmate and I have been working on trying to reduce the amount of hallucination generated by large language models. Recently, we were informed that we would have to present our work to the university panel. We had generated some automated benchmarks like ROUGE and BERT scores, but those raw numbers weren't very exciting. So, we decided to create an web app, which would summarize large paragraphs using our model and highlight the hallucinated parts.
At this point, we faced a problem that looked relatively simple, but the solution wasn't. In order to run the web application, we had to host it somewhere. This was easy — there are a number of online hosting providers like Render, Railway along with big names like AWS and GCP, which also offers a free tier for their EC2 or Compute Engine services.
The problem arose when we tried to run the models. The server quickly exhausted its memory, and even for some very small models, the processing time was too long. Although I had a GPU machine at home, I couldn't access it because I was in a completely different city.
During our research phase, we heavily used Kaggle and Google Colab notebooks to run and test the models. But this was something different — we knew we needed a GPU in our servers. We looked up GPU prices, and oh man, they cost anywhere from $400 to $2000 and beyond, with some even going up to $40,000, which were well out of our budget.
The other option we considered was to use Kaggle as the compute backend for the app. But it simply wasn’t well documented. On top of that, during our testing, we found that the long-running mode in Kaggle most of the time only stored the results without giving much access or control over it, which was required to run a web server. So we had to ditch the idea.
Trying to connect to my own setup
I came across the term homeserver while scrolling through YouTube videos. I found many tutorials explaining how to set up port forwarding in routers to expose local services on the internet. At first glance, it seemed like all I needed — if I could use Remote Desktop Protocol or even SSH into the system with the GPU, I could run the server there, while physically not working on the system (it's called remote access for a reason). Even more fortunately, my ISP also provides a static IP, so exposing a port felt like the logical option. You either add A
records with the static IP to set up a domain name; otherwise, use the IP itself, configure some firewall rules, map some ports in your router, and you’re done, right?
Turns out, it's not that simple.
The problem of port forwarding
Opening ports like SSH (port 22) or HTTP (port 80) directly to the internet highly increases your system's exposure to various security threats. Automated crawlers constantly scan the internet for open ports and known vulnerabilities. Brute-force attacks attempt to guess login credentials for services like SSH (you can disable password login and use public private key based authentication)
. Even more disruptive are Distributed Denial of Service (DDoS) attacks, where malicious actors flood your open ports with traffic, potentially overwhelming your internet connection and making your services unavailable. By directly exposing these ports, you essentially create a welcome mat for potentially harmful activity, making it easier for unauthorized individuals to attempt to gain access or disrupt your system.
VPNs, Tunnels, and WireGuard
Like many, I initially thought VPNs
or Virtual Private Networks
were mainly for bypassing location restrictions to access content. In reality, they’re much more capable. They can create secure private connections between devices across the globe — sort of like pretending your devices are all in the same room, connected via LAN.
Digging into their details, I came across the WireGuard Protocol. Developed by Jason A. Donenfeld, It is a fast and modern VPN protocol that utilizes state-of-the-art cryptography
. WireGuard is around four thousand lines of code and has been a part of the Linux kernel since version 5.6.
But setting up raw WireGuard between multiple machines, handling keys, firewalls, NAT traversal — it's doable, but not beginner-friendly.
Cloudflare Tunnels
Parallel to VPNs, I was also looking into Cloudflare Tunnels, which are excellent for exposing web services securely to the internet without needing to open ports on the router. Cloudflare’s Zero Trust Network sets up a secure tunnel from your system to their edge network and then serves the content over TLS. You need a dedicated domain name set up with Cloudflare nameservers, and your subdomains can point to internal services on your local machine. Keep in mind, TLS termination is done at Cloudflare's servers, and Cloudflare can inspect the network traffic.
To use it, you need to run the Cloudflare daemon on your local system (they have setup guides for every major OS, but the Docker one was the easiest — it’s literally just one command). Since the connection is initiated from inside the network (i.e., outbound traffic), no firewall changes are required. I found a lot of available resources on internet both in the form of articles and YouTube tutorials explaining this technique. You can also set up Caddy or Traefik as a proxy to your services (they run inside your home network, hence have access to all resources, so one tunnel can be used to access multiple services) in case you want load balancing or end-to-end TLS from your system to Cloudflare.
But for full machine access — especially for something like SSH or file transfers — Cloudflare Tunnel felt a bit limited.
That’s where Tailscale came in.
Enter Tailscale
Tailscale is a layer over WireGuard that makes all the painful parts disappear. It uses some clever NAT traversal techniques, with a fallback to their relay servers (called DERP). The system requires a coordination server for key exchange, but after that, peers communicate over a mesh topology. We still have to trust their servers for coordination, but apart from identity, data exchanged between peers is end-to-end encrypted. The source code is available on GitHub, you can inspect it yourself. In case you are still in doubt, there is an open source, self-hosted implementation of the Tailscale control server named Headscale.
Honestly, the setup was so simple on both Linux and Windows that I felt like something was wrong. They have two really informative blog posts about NAT traversal and how Tailscale works, which explain the nitty-gritty details. I had to run one single command (depends on your OS) to install the tailscale
CLI, and another to start the tailscaled
client. After authenticating with your OAuth provider (Google, GitHub, they have quite a few options), the devices showed up under my Tailscale network (tailnet) almost instantly. I was not required to mess with keys or firewall rules. Traffic between any two peers is cryptographically encrypted. With MagicDNS, I could set up also setup custom hostnames for the machines. They also provide a nice dashboard from which you can manage the devices that are currently part of your network.
Final Setup
While at home, I occasionally used to SSH into the desktop, so I already had the setup. I installed Tailscale on my laptop (running Windows 11 Home) and on my desktop (running openSUSE Tumbleweed with a GPU). Within minutes, I was able to SSH into my home setup over a secure WireGuard connection.
Tailscale also offers two options for exposing services. tailscale serve
allows services running on a local machine to be exposed only to other peer devices within the tailnet. For instance, when running a React application locally, using tailscale serve
makes it accessible to peer machines via the host's Tailscale name (remember, MagicDNS). The other option is tailscale funnel
, which creates a tunnel to expose internal services securely to the public internet.
I didn't try tailscale funnel
as my primary requirement was only to access the home machine from my laptop securely from within the tailnet.
I kept Tailscale running as a background service — it auto-connected to the tailnet on boot. I ran the backend server on my desktop, exposed it to peers using tailscale serve <port-number>
, and accessed it from my laptop using its tailnet hostname.
Concluding Notes
With this setup, my GPU system felt like it was right next to me. In a way, it was, due to the mesh topology between peers. The connection was fast, stable, and encrypted end-to-end. I could run the model in real time from our web app without any noticeable delay and honestly, it was surprisingly effortless compared to the complexity of traditional VPN setups.