Paper summary: MassBrowser: Unblocking the Censored Web for the Masses, by the Masses (NDSS 20)

MassBrowser: Unblocking the Censored Web for the Masses, by the Masses
Milad Nasr, Hadi Zolfaghari, Amir Houmansadr, Amirhossein Ghafari
https://censorbib.nymity.ch/#Nasr2020a
https://massbrowser.cs.umass.edu/

MassBrowser is a multi-modal circumvention system that aims to overcome the deficiencies of other systems by combining many circumvention techniques: selective proxying, CacheBrowsing (Holowczak and Houmansadr 2015, Zolfaghari and Houmansadr 2016), domain fronting, volunteer proxies, and user-to-user proxying. It is designed to be difficult to block, provide high quality of service, be easy to deploy and cheap to operate, and enable users to control their level of privacy. The main design principle of MassBrowser is that circumvention systems should concentrate on providing blocking resistance only, with anonymity and privacy being optional features. The system has operated as an invitation-only beta for more than a year.

The system consists of censored Clients, volunteer proxies called Buddies, and a collection of backend infrastructure called the Operator (Fig. 1). Whenever a Client needs to connect to some destination, it considers a prioritized list of connection options, preferring options that have lower cost and higher performance (Fig. 4):

  • If the destination is known to be unblocked, just access it directly, without any circumvention.
  • If the destination can be reached by CacheBrowsing (i.e., is hosted on certain CDNs), use CacheBrowsing.
  • If the destination belongs to a whitelisted content category (Table III), consult the Operator to get matched up with a Buddy or another Client, and access the destination using the Buddy or Client as a proxy.
  • Otherwise, access the destination over a Tor tunnel, using a Buddy that also acts as an obfuscated Tor bridge.

The Operator is the arbiter of what destinations are considered blocked or CacheBrowseable. The operator sources this information from ICLab and GreatFire, together with its own web crawls. Clients download this information from the Operator and refresh their local cache of it periodically. Clients’ communication with the Operator is protected by domain fronting, though any other unblockable channel (even a low-bandwidth or high-latency one) would work. Because a Client’s routing decisions depend on what destinations are being accessed, the MassBrowser Client software needs to be able to inspect traffic, even encrypted traffic. To that end, the Client installs a local root TLS certificate and does TLS interception of everything that flows through the Client software.

To become a Buddy, a person downloads and runs the standalone MassBrowser Buddy software. Communication between Clients and Buddies is encrypted and obfuscated using an obfsproxy-like modular transport; because the Buddy software is not a browser extension, it is not limited to using web protocols like WebRTC and can be freer in its obfuscation. Clients may also use other censored Clients as Buddies; the intuition is that what is blocked in one censored network is usually not blocked in another. A Buddy is a one-hop proxy: it has the ability to inspect traffic, and any outgoing connections will be attributed to the Buddy. Buddies can express a whitelist of content categories they are willing to proxy; how it works is the Client contacts the Operator and says “I need to access a Gaming destination,” and then the operator matches the Client with a Buddy that has whitelisted the Gaming category. Certain content categories (pornography) are never proxied through one-hop Buddies but instead always go through a Tor tunnel. Besides content categories, the Operator considers compatibility of NATs and the current load on each Buddy when matching Clients with Buddies, and uses the Enemy at the Gateways proxy distribution mechanism to mitigate the risk of Buddy-discovery attacks.

Thanks to Amir Houmansadr for commenting on a draft of this summary.

How are the domains categorized? Is it a manual process? Do they use a third party category list?

Good question. I don’t know the answer. I searched the repository for “Gaming” but found only localization files. I’m guessing categorization is done by the Operator and clients download the categories, but I don’t know if the Operator source code is online.