Paper summary: Too Close for Comfort: Morasses of (Anti-) Censorship in the Era of CDNs (PETS 2021)

Too Close for Comfort: Morasses of (Anti-) Censorship in the Era of CDNs
Devashish Gosain, Mayank Mohindra, Sambuddho Chakravarty

This paper is about geolocating popular web sites (from the perspective of measurement points actually located in a selected country) and the implications for anti-censorship systems that depend on covert proxies being located outside a censor’s zone of control. In the current age of CDNs, web sites may be multiply homed. Their effective location is not an origin web server, but a CDN front-end server—and which front-end server you get depends on where you ask from. Given a country and a web site, this paper provides a method for determining whether the web site is hosted within that country’s borders, as seen by users in that country. Case studies in five countries show that the majority of country-specific top sites are effectively hosted within those respective countries.

The core measurement technique is something the authors call Region Specific Constraint Based Geolocation (R-CBG). The speed at which packets travel in a network is limited, which means you can estimate distance by round-trip time (RTT). The speed of packets is limited physically by the speed of light, but in practice effects like queuing delays mean that packets move more slowly than that. To deal with these other sources of delay, Constraint Based Geolocation involves a preliminary calibration phase, in which measurement nodes ping each other to establish empirical bounds on the relation between distance and RTT, in the context of that set of measurement nodes. After calibration, geolocate a target IP address by measuring its RTT from each of the measurement nodes. Each measurement node’s RTT translates to a distance, which is the radius of a circle on the surface of the earth, centered on that measurement node. The intersection of all such circles is the predicted geolocation area of the target.

The difference with Region Specific Constraint Based Geolocation is that it uses only measurement nodes that are located in or near the country under investigation. The authors found this heuristic necessary to get high accuracy in ground-truth evaluations. For the purposes of this research, the desired output of geolocation is not necessarily a latitude and longitude, but a determination of whether the target IP address is likely located inside or outside the country. After finding the area of intersection of the restricted set of measurement nodes, the inside/outside prediction is made as follows: if the centroid of the intersection is within the country’s borders, and the intersection is not so large that it encompasses the entire country, the target is inside; otherwise it is outside. Some additional care is needed to deal with anycast addresses, which may be different places in the network, depending on the source. The authors applied R-CBG in five countries (Brazil, India, Iran, Saudi Arabia, United States) over five months and found that between 60% and 90% of the Alexa top 1000 sites specific to each country are effectively hosted within that country. Their measurement nodes were RIPE Atlas probes, at least 15 per country.

The implications of this research for anti-censorship is that ostensibly “foreign” network destinations may actually be hosted within the country—and therefore within the censor’s sphere of influence—which may have network detection or legal consequences. For example, decoy routing systems commonly require passing specially tagged traffic through a relay station to an overt site; this is not possible if the path to the overt site is short-circuited by an in-country CDN front-end server. Servers used for domain fronting or audiovisual tunneling in the style of CovertCast, even if their traffic is not directly visible to a censor, are possibly at greater risk of coercion if they are located within the censor’s legal jurisdiction. The CacheBrowser research (Section 3.2.1) showed that CDN front-end servers themselves may apply different censorship rules inside and outside a country; but some CDN architectures (particularly anycast-based ones) may make it hard to actually get packets routed to an external front-end server. The observation that many popular destinations are located close to the user also challenges the assumption that a few powerful ASes have visibility over a large fraction of Internet traffic.

Thanks to the authors for reviewing a draft of this summary.