Measuring and Evading Turkmenistan’s Internet Censorship: A Case Study in Large-Scale Measurements of a Low-Penetration Country
Sadia Nourin, Van Tran, Xi Jiang, Kevin Bock, Nick Feamster, Nguyen Phong Hoang, Dave Levin
https://censorbib.nymity.ch/#Nourin2023a
https://github.com/breakerspace/turkmenistan-censorship
Measurement dashboard
Lightning talk (3 min)
Presentation slides
This is a study of DNS, HTTP, and TLS censorship in Turkmenistan, notably encompassing every IP address in the country. Turkmenistan poses a challenge for censorship measurement because of its low population and low availability of Internet access. It is difficult to take direct measurements from inside the country. This study uses remote measurement techniques, taking advantage of the bidirectionality of the firewall to do experiments without controlling a vantage point in Turkmenistan. The paper covers data collected in September and October 2022. The team has continued to do tests and made the results available in a dashboard at https://tmc.np-tokumei.net/.
Bidirectionality means the firewall filters incoming packets as well as outgoing ones. Sending a DNS query for a filtered domain name into the country results in an injected DNS response with a false IP address being sent back to the sender, just as if the query had been sent out of the country. Similarly, an HTTP request with a filtered Host header, or a TLS Client Hello with a filtered SNI, elicits an injected TCP RST packet, regardless of direction. In the case of HTTP and TLS, censorship persists for 30 seconds: any packet with the same source–destination 4-tuple within that interval gets another RST. Injected packets are easy to identify because they have a distinctive IP ID and initial TTL. In a change from Bidirectional DNS, HTTPS, HTTP injection in Turkmenistan · Issue #80 · net4people/bbs · GitHub (August 2021), injection happens on all port numbers.
There are two big complications that make straightforward application of the bidirectionality property insufficient for large-scale measurement. The first is that—in what seems to be a first—source IP addresses that send many probes into the country may eventually stop getting injected responses, as if the censor were deliberately trying to frustrate analysis. To deal with this, the measurement system uses a diverse and changing set of source IP addresses from commercial VPSes. The second complication is that not all IP addresses in Turkmenistan are equal, in terms of whether they cause injection when they appear in the destination address of a probe. Different networks—and even neighboring addresses—differ in whether they trigger censorship responses. For this reason, the authors undertook to test every IP address in the country, some 22,700 addresses across 6 ASes. But this gives rise to another challenge, which is that while DNS probes do not require the probed IP address to be live, the HTTP and TLS tests occur in the context of a TCP connection, which requires that there be a live, responsive host at the destination. To work around this, the authors found a new sequence of probes that can detect TCP-based censorship injection without an established TCP connection: send a PSH+ACK packet containing the probe text (i.e. HTTP request or TLS Client Hello), wait 5 to 29 seconds, then send another packet. If the second packet gets a RST, it means the probe was recognized as one to censor. By combining these techniques, they were able to scan every IP address in Turkmenistan for DNS, HTTP, and TLS censorship.
The measurement process began with a pre-scan of all the IP addresses using a small number of domains, to find which ones were susceptible to censorship at all. They filtered out hosts that were found to be responsive during the pre-scan, in order to avoid sending them a lot of traffic in later phases. There were about 7,500 addresses (33%) that could trigger injection. Using the addresses in this smaller set, they probed 15.5 million domain names on DNS, HTTP, and TLS. They found 122,000 blocked domains in total. Blocklists differed by protocol, with HTTP having the most censored domains and DNS having the fewest. From the list of blocked domains and further probing they inferred regular expression blocking rules. Over-broad expressions like .*\.cyou.*
and doh\..*
cause a high degree of overblocking.
Finally, the authors use Geneva to find new circumvention strategies at the TCP/IP and application layers. These include setting one of the COUNT
fields in a DNS query to 25 or greater, breaking the HTTP-version
in an HTTP request across TCP segments, and inserting whitespace into the HTTP Host header.
Thanks to Sadia Nourin and Nguyen Phong Hoang for comments on a draft of this summary.