Measuring the Deployment of Network Censorship Filters at Global Scale
Ram Sundara Raman, Adrian Stoll, Jakub Dalek, Reethika Ramesh, Will Scott, Roya Ensafi
The paper describes FilterMap, a framework for remotely detecting network filters and categorizing them according to their HTTP/HTTPS blockpages. The project aims to have a global reach (not targeted at any particular country) and to be repeatable over time (the paper describes 3 months of semi-weekly measurements). FilterMap extends, systematizes, and automates much of what has previously been done by manual analysis in past works such as “A Method for Identifying and Confirming the Use of URL Filtering Products for Censorship”, “Automated Detection and Fingerprinting of Censorship Block Pages”, and “Planet Netsweeper”.
FilterMap collects data from three complementary sources: Quack, Hyperquack, and OONI web connectivity tests. Quack (previous summary here) uses echo servers (port 7) to reflect HTTP/HTTPS requests back out through the firewall. Hyperquack (new in this work, Section III-A) does not reflect, but rather sends HTTP/HTTPS requests from the outside to web servers inside the firewall, relying on the network filter to treat traffic equally in both directions. OONI differs from the other two sources in that it does not use remote measurement, but rather direct HTTP/HTTPS fetches from volunteer-operated probes. Despite OONI’s overall lesser measurement scope, it finds some filters that the remote techniques do not. In all cases, the authors strive to minimize potential harm to remote users by selecting “organizational” servers as the targets for Quack and Hyperquack, and making use of OONI’s informed-consent recruitment process for new probes. The raw data of Quack and Hyperquack measurements is available at https://censoredplanet.org/data/raw.
By testing a variety of presumed-benign and potentially filtered domains, FilterMap gets a list of network paths that contain a filter, as well as the blockpage that the filter returns. After that there is a data analysis phase, in which blockpages are clustered into categories by similarity, and assigned labels according to the vendor of the filter or the identity of its operator. A first “iterative clustering” step reduces the mass of the data by identifying the most common HTML responses. Any responses that remain after iterative clustering are clustered by visual similarity; i.e., by the bitmap of the rendered page. This process resulted in about 200 groups of blockpages, which then were manually labeled by inspecting one exemplar of each group. There’s a catalog of blockpages and manually identified signatures at https://censoredplanet.org/filtermap/results.
In 3 months of operation, FilterMap found network filter installations in 103 countries and dozens of blockpage groups. The most common commercial filter vendor is FortiGuard, with presence in 60 countries. The content categories most commonly blocked are pornography and gambling. Bahrain, Iran, Saudi Arabia, and South Korea use a common blockpage across the country, while in India and Russia each ISP has its own blockpage. China does not use blockpages, preferring instead to reset connections.
Thanks to Ram Sundara Raman for commenting on a draft of this summary.