Measuring and Evading Turkmenistan’s Internet Censorship: A Case Study in Large-Scale Measurements of a Low-Penetration Country

Measuring and Evading Turkmenistan’s Internet Censorship: A Case Study in Large-Scale Measurements of a Low-Penetration Country

Abstract

Since 2006, Turkmenistan has been listed as one of the few Internet enemies by Reporters without Borders due to its extensively censored Internet and strictly regulated information control policies. Existing reports of filtering in Turkmenistan rely on a handful of vantage points or test a small number of websites. Yet, the country’s poor Internet adoption rates and small population can make more comprehensive measurement challenging. With a population of only six million people and an Internet penetration rate of only 38%, it is challenging to either recruit in-country volunteers or obtain vantage points to conduct remote network measurements at scale.

We present the largest measurement study to date of Turkmenistan’s Web censorship. To do so, we developed TMC, which tests the blocking status of millions of domains across the three foundational protocols of the Web (DNS, HTTP, and HTTPS). Importantly, TMC does not require access to vantage points in the country. We apply TMC to 15.5M domains, our results reveal that Turkmenistan censors more than 122K domains, using different blocklists for each protocol. We also reverse-engineer these censored domains, identifying 6K over-blocking rules causing incidental filtering of more than 5.4M domains. Finally, we use , an open-source censorship evasion tool, to discover five new censorship evasion strategies that can defeat Turkmenistan’s censorship at both transport and application layers. We will publicly release both the data collected by TMC and the code for censorship evasion.

Measuring and Evading Turkmenistan’s Internet Censorship: A Case Study in Large-Scale Measurements of a Low-Penetration Country
Sadia Nourin, Van Tran, Xi Jiang, Kevin Bock, Nick Feamster, Nguyen Phong Hoang, Dave Levin
https://censorbib.nymity.ch/#Nourin2023a
https://github.com/breakerspace/turkmenistan-censorship
Measurement dashboard
Lightning talk (3 min)
Presentation slides

This is a study of DNS, HTTP, and TLS censorship in Turkmenistan, notably encompassing every IP address in the country. Turkmenistan poses a challenge for censorship measurement because of its low population and low availability of Internet access. It is difficult to take direct measurements from inside the country. This study uses remote measurement techniques, taking advantage of the bidirectionality of the firewall to do experiments without controlling a vantage point in Turkmenistan. The paper covers data collected in September and October 2022. The team has continued to do tests and made the results available in a dashboard at https://tmc.np-tokumei.net/.

Bidirectionality means the firewall filters incoming packets as well as outgoing ones. Sending a DNS query for a filtered domain name into the country results in an injected DNS response with a false IP address being sent back to the sender, just as if the query had been sent out of the country. Similarly, an HTTP request with a filtered Host header, or a TLS Client Hello with a filtered SNI, elicits an injected TCP RST packet, regardless of direction. In the case of HTTP and TLS, censorship persists for 30 seconds: any packet with the same source–destination 4-tuple within that interval gets another RST. Injected packets are easy to identify because they have a distinctive IP ID and initial TTL. In a change from Bidirectional DNS, HTTPS, HTTP injection in Turkmenistan · Issue #80 · net4people/bbs · GitHub (August 2021), injection happens on all port numbers.

There are two big complications that make straightforward application of the bidirectionality property insufficient for large-scale measurement. The first is that—in what seems to be a first—source IP addresses that send many probes into the country may eventually stop getting injected responses, as if the censor were deliberately trying to frustrate analysis. To deal with this, the measurement system uses a diverse and changing set of source IP addresses from commercial VPSes. The second complication is that not all IP addresses in Turkmenistan are equal, in terms of whether they cause injection when they appear in the destination address of a probe. Different networks—and even neighboring addresses—differ in whether they trigger censorship responses. For this reason, the authors undertook to test every IP address in the country, some 22,700 addresses across 6 ASes. But this gives rise to another challenge, which is that while DNS probes do not require the probed IP address to be live, the HTTP and TLS tests occur in the context of a TCP connection, which requires that there be a live, responsive host at the destination. To work around this, the authors found a new sequence of probes that can detect TCP-based censorship injection without an established TCP connection: send a PSH+ACK packet containing the probe text (i.e. HTTP request or TLS Client Hello), wait 5 to 29 seconds, then send another packet. If the second packet gets a RST, it means the probe was recognized as one to censor. By combining these techniques, they were able to scan every IP address in Turkmenistan for DNS, HTTP, and TLS censorship.

The measurement process began with a pre-scan of all the IP addresses using a small number of domains, to find which ones were susceptible to censorship at all. They filtered out hosts that were found to be responsive during the pre-scan, in order to avoid sending them a lot of traffic in later phases. There were about 7,500 addresses (33%) that could trigger injection. Using the addresses in this smaller set, they probed 15.5 million domain names on DNS, HTTP, and TLS. They found 122,000 blocked domains in total. Blocklists differed by protocol, with HTTP having the most censored domains and DNS having the fewest. From the list of blocked domains and further probing they inferred regular expression blocking rules. Over-broad expressions like .*\.cyou.* and doh\..* cause a high degree of overblocking.

Finally, the authors use Geneva to find new circumvention strategies at the TCP/IP and application layers. These include setting one of the COUNT fields in a DNS query to 25 or greater, breaking the HTTP-version in an HTTP request across TCP segments, and inserting whitespace into the HTTP Host header.

Thanks to Sadia Nourin and Nguyen Phong Hoang for comments on a draft of this summary.

Hi everyone, I’m Sadia, one of the authors of this paper. In order to measure Turkmenistan’s censorship, we had to take advantage of bidirectional censorship, in which we had a client outside of Turkmenistan send censored requests to non-responsive IP addresses inside of Turkmenistan to trigger the censor. However, one question we frequently asked ourselves is whether our measurements from the outside→inside direction corroborates with measurements from the inside→outside direction.

It would be great if there were some volunteers within Turkmenistan who could spot-check some of our measurements for us from the inside→outside direction. Please ensure your safety and understand the risks of doing so before proceeding.

You can check whether TMC considers you to be censored by searching for your own IP address here. If you are deemed to be censored, you could test some of the domains that TMC believes to be censored. These domains can be found here and here. In order to test these domains, you could try to use the packet sequence we use for our measurements mentioned in the paper, or just send a simple DNS and HTTP(S) request.

If you determine that your IP address is NOT considered to be censored by TMC, you could still test some domains to determine whether the IP address is uncensored from the inside→outside direction as well.

Thank you.

What do you mean by “You can check whether TMC considers you to be censored”? Home, work, mobile internet is always censored. My vps, home and work IP adresses are there, but no mobile. Which IP is not censored? In TMC Dashboard it says that wikipedia.org, yandex.net, github.com are censored, what does it mean? They load fine (some ips for yandex and github are blocked but it loads from others)

Apologies for my vagueness. TMC considers an IP address to be censored when sending a request to a censored domain (such as twitter.com, wikipedia.org, yandex.net, github.com) from a client outside of Turkmenistan to a non-responsive IP address inside Turkmenistan causes an injected response from the censor.

If the censor does not respond with an injection when we send a confirmed censored domain to a non-responsive IP address in Turkmenistan, then we deem that IP address to not be subjected to censorship/filtering by Turkmenistan. As such, TMC does not consider your mobile IP address to be subjected to censorship/filtering.

While some filters may be applied to both inbound and outbound traffic (bi-directional), outbound traffic gets filtered much heavier (if it were bi-directional, most of the internet won’t be able to access websites hosted in Turkmenistan), and in addition there’s destination-related filters, like DNS check for Fastly destinations.