What is a VPN for?

Virtual Private Networks are a powerful tool for network configuration and security. But they’re frequently misunderstood and misused. Does buying access to a commercial VPN service protect your privacy? What are VPNs for? When do you need one?

Videos:

What is a VPN?

VPN stands for Virtual Private Network. Let’s break that down.

A network is some computers that are connected to each other. Software running on the computers can send messages to software on the others.

You can think of the internet as one big network, but the word “inter-net” more correctly describes a collect of many interconnected local networks.

And those local networks can be described as either public networks or private networks. On a public network, the machines have public addresses and can be reached directly from anywhere on the internet. On a private network, the machines have private addresses and can’t be reached directly from the internet.

Private versus public addresses are a technical distinction specified by internet standards. Specifically, with IPv4 addresses, private addresses start with (10.) or (192.168.) or (172.16. through 172.31.), available for each network administrator to assign however they want. Most of the rest of the possible IPv4 addresses are public addresses, assigned by a global registry organization called the Internet Assigned Numbers Authority (IANA).

For example, if you have a home router you most likely have a private network. If you plug a computer into the router, the router will probably assign your computer a private internet address like 192.168.1.101. That means computers on the public internet can’t send messages directly to your computer - every home router uses the same private address blocks, so if you try to send a message to that address through the internet there’s no way to know which 192.168.1.101 to deliver it to.

Note that this setup with a router and a private network is common in several other scenarios: corporate networks, school networks, wifi pretty much anywhere, etc. Working on a computer with a public internet address in 2020 is rare.

To let your computer on a private network communicate with machines on the internet, the router needs to do something called Network Address Translation. The router gets a public address from your ISP (say… 20.20.20.20). If your computer sends a request to a public webserver (say… at 30.30.30.30), then your router will rewrite the return address in the outgoing message to its own address and remember that you made that request. The webserver responds to 20.20.20.20, and when the response comes back the router will remember that your computer made the request and send it on to your computer.

But sometimes you want to send messages within the local network. For example, you might plug in a network printer. Your router will give it a local address (like 192.168.1.102). Then your computer can find the printer by sending local broadcast queries (hi, any printers?) and can send print jobs directly to it using local addresses.

But what if you want to print on the printer at the office while you’re at home?

Your computer can’t discover the printer, because it’s not on the local network. You could give the printer a public internet address and then manually add it to your computer, but then anyone in the world could print to that printer if they knew or could guess the public address.

This sort of problem is why VPNs were invented. A VPN allows devices that aren’t necessarily on a local physical network to work like they were on the same local network. So if you connect to the office VPN, you’ll be able to discover and print to the office printer even if you’re at home or on another continent.

A local private network provides some innate security compared to sending data over the internet. Messages sent over physical wires within a single building are effectively immune to interception and forgery by an external attacker. In order to provide similar behavior, VPNs work by creating an encrypted tunnel. They’re not quite as secure as a physical wire - an attacker could still monitor the patterns of network traffic and try to figure out something from that - but the encrypted VPN is generally safe enough to do stuff you’d normally only do on a local network like using a printer.

So, in summary, you want to set up a VPN when:

  • You have programs running on on two computers that want to talk to each other.
  • Those programs would just work if the computers were in the same room connected to the same LAN.
  • The computers are not on the same LAN.

Because VPNs exist, you might want to set up a program like an email server to only allow access from a local network for security, and then provide a VPN to let users get onto that local network. If you require cryptographic authentication to access the VPN, this can make things much more secure than direct access to the server.

A VPN can enable some pretty powerful stuff, like being able to make encrypted VOIP phone calls within an organization from anywhere in the world.

What about commercial VPNs for privacy?

Several companies currently sell a “VPN” privacy product. You run VPN software to connect to their private network, and then all your internet traffic is routed through their server.

This is a weird use of VPN technology - you don’t actually care about connecting to other machines on the same private network - but it has a couple potential privacy benefits:

Every website you visits sees your source IP. With a VPN, this is the IP of the VPN server rather than the IP assigned by your internet service provider.

The most common concrete reason why you’d want that is for region locked services. If you want to stream a video that’s only available in the UK but you’re in Kansas, routing through a VPN server in London might let you access it.

The next most common reason is for mildly illegal activity, like downloading pirated movies. If the copyright holder gets your VPN IP rather than your real IP, they probably won’t be able to use that to track you down.

Unfortunately, this style of VPN doesn’t do one of the things that some of the providers advertise it for: Protect you from tracking, profiling, and surveillance by large tech companies.

From the perspective of a big tech company, a commercial VPN just looks like any other shared private network - there are several users with the same source IP because they’re behind a router doing NAT. Again, corporate networks, school networks, public WiFi, and even home networks with multiple users all work this way.

The tracking systems that big tech companies use wouldn’t work at all if they couldn’t handle multiple users sharing a single IP. A commercial VPN isn’t a cloak of secrecy, instead it’s almost exactly like you using the internet from another physical place like work or school.

That being said, Google can’t identify you based on a single visit to Google search in a freshly installed privacy browser from behind a VPN. That first “clean” visit is effectively anonymous.

So what does Google do when someone they don’t recognize visits one of their sites? They create a new user session, and start tracking it. When they start tracking a new session like this, there are two possibilities:

  • This is a new user they’ve never seen before.
  • This is an existing user and they just haven’t figured out who yet.

Now, Google would love to have found a new user to track. But they know that the second case is significantly more likely, so they put a lot of effort into handling that case.

In fact, the model they use for tracking people is almost certainly works by probabilistically linking individual sessions together into user profiles.

Some sessions almost certainly identify a specific human. For example, that time you logged into your Google Adwords account and typed in your legal name and social security number.

Other sessions aren’t linked to any identifying information, like the single Google search from a clean browser over a VPN.

But most sessions have some identifying information. Like the account information from when you logged into youtube in the same browser. Or the same source IP address at the same time as a software update request from an Android phone with a logged in Google account. Or the same unique browser fingerprint as another session.

With very careful operational security, a commercial VPN could be used as part of a strategy to maintain two separate digital identities. If you always have the VPN enabled for one identity and disabled for the other, then your IP address can’t be used to link your two separate profiles.

But that’s an extremely fragile plan. Simply forgetting to disable the VPN when switching identities can have both sessions sending requests from the same IP. Then the two profiles are linked. With one mistake they won’t be marked as the the same person with 100% certainty, but the tracking system will suspect that possibility. As more op-sec mistakes give the system more clues, the two profiles will merge into one.

And the rest of the strategy to create two profiles isn’t trivial. You’ll need to use two separate browsers. You’ll want to have the profiles active at different times. You’ll need to make sure to avoid cross-browser fingerprinting (e.g. two browsers on the same system will still report the same set of installed fonts, the same screen resolution, etc). You won’t want to type too much text, so that automatic stylometry doesn’t give you away by patterns in your writing style, etc.

Using a VPN without an opsec-intensive effort to create seperate user profiles will have basically zero effect on big tech surveillance. I used Google here as an example, but the same applies to the other big tech companies like Amazon, Facebook, Microsoft, Apple, etc. The only realistic way to reduce the data these companies are collecting about you is to reduce how much you use their products and services.

When do you need a VPN?

If you want to securely access a private network, then VPN software is exactly the correct tool. Using VPN software to access the public internet only really makes sense in a few specific cases like spoofing your location to access region-limited content.