On the MASQUE list, Töma Gavrichenkov, claiming some inside knowledge, suggests a way to detect VPN users. They are the ones that have long-lived high-traffic connections, but never connect directly to web trackers (Google Analytics etc.). The idea is that because web tracking is so ubiquitous, non-VPN users are constantly connecting to tracking hosts, but VPN users who have all their connections tunnelled do not.
Basically their product works as follows. First, it has a collection of IP addresses and some behavioral patterns of an order of hundreds to thousands of typical external resources Web sites use: CDNs, trackers (Google Analytics, Newrelic, JQuery, Recaptcha, to name a few, but there are more), etc. The list is meant to be being updated once in a few days if not hours.
Then, if a client establishes a number of active bandwidth-heavy connections to remote servers but doesn’t connect to a statistically significant number of those trackers within some timeframe (the thresholds are also being regularly updated I think), then it assumed to be using a VPN. All the established sessions (no matter if it’s TCP or UDP) are dropped and the former endpoints (except some) are greylisted and reported, and the subsequent HTTP[S] connection establishment attempts get a redirect to a Web page which tells the user to switch off the VPN connection.
This is an interesting idea. The subsequent discussion on the list seems a little overblown, though. I think I agree with Ted Hardie and Ben Schwartz that there’s nothing fundamental about this traffic analysis attack that’s different from other traffic analysis attacks, and no reason to freak out about this one in particular.
No one mentioned it on the list, but it seems to me that you could defeat this kind of attack anyway, by running a non-tunnelled web browser alongside your VPN? Let it refresh itself, just to ensure that some threshold of trackers are being contacted while you’re using the VPN.