Paper summary: Geneva: Evolving Censorship Evasion Strategies (CCS 19)

Geneva: Evolving Censorship Evasion Strategies
Kevin Bock, George Hughey, Xiao Qiang, Dave Levin
https://censorbib.nymity.ch/#Bock2019a

Geneva is a genetic algorithm that automatically discovers censorship evasion strategies by combining primitive operations in various ways and evaluating the combinations against a network censor (real or simulated). The strategies it discovers are packet-level manipulations like those of Khattak et al. 2013, lib·erate, and INTANG—things like sending overlapping segments or dropping certain packets. In fact, Geneva automatically rediscovers most of the evasions that prior work had found manually, as well as new and updated ones that manual analysis probably would not have found. They train and evaluate Geneva in the lab aginst simulated censors, and in the wild against real censors in China, India, and Kazakhstan.

An evasion strategy consists of paired triggers and actions. A trigger is a predicate over packets; for example [TCP:flags:R] matches TCP RST packets. An action is an operation on a single packet: like duplicate, which makes a copy of a packet and allows you to modify the original and the copy separately; fragment, which breaks a packet or segment into two parts and likewise permits further processing on both parts; and tamper, which sets a field in the packet to a static or a random value, while updating dependent fields such as the checksum. Whenever a trigger is true, it causes its associated action to happen. Actions may recursively invoke other actions, forming a tree structure: for example duplicate has two action subtrees that say what to do with the each of the two copies. At the leaves of the tree appear the special terminal actions send and drop. There are separate lists of triggers and actions in the incoming and outgoing directions.

A sample strategy is:

[TCP:flags:A]-duplicate(
    send,
    tamper{TCP:flags:replace:R}(
        tamper{TCP:chksum:corrupt}(
            send
        )
    )
)-|
\/

The trigger [TCP:flags:A] matches outgoing ACK packets. duplicate makes two copies of the packet. The first one is sent unmodified, but the second is changed into a RST packet with a bad checksum before being sent. The \/ separates the outgoing and incoming trigger–action lists; in this case the incoming section is empty. This strategy, currently effective against the GFW, tricks the middlebox into thinking the connection has been terminated because of the RST packet it sends, but the RST actually has no effect on the remote server because of its incorrect checksum. There’s a catalog of strategies in Table 1 and at https://github.com/Kkevsterrr/geneva/blob/master/strategies.md.

Geneva starts with a population of strategies that may be initialized randomly or seeded with known strategies. Individuals in the population undergo mutation (random changes to their actions) and crossover (swapping subtrees of actions with another individual). The fitness of an individual is primarily determined by its effectiveness against the censor—Geneva tries it out to see if it works—with penalties for large action trees or high network overhead. High-fitness individuals are more likely to be selected and survive into the next generation. The authors report that computing each new generation takes 5–10 minutes, and full training 4–8 hours.

The bulk of the authors’ validation was in China against the Great Firewall. Geneva finds a number of strategies that confuse the firewall’s notion of what the correct TCP sequence number is or whether the connection has been closed. It also finds a few weird and unexpected strategies that seem to expose previously unknown and subtle characteristics of the GFW’s classification algorithms. Take for example Strategy 7 in Section 5.2: splitting a TCP segment at offset 8 doesn’t work, but splitting it at offsets 8 and 12 does—even when the censored keyword is not split across segments. They additionally tested in India (on the Airtel ISP) and Kazakhstan (during the time when the TLS MITM was still happening), where Geneva found effective strategies that were comparatively simpler than the China ones.

There’s a project home page:

As of this writing, the genetic training algorithm is not yet available to download, but there is source code for the client-side software that implements pre-trained strategies.

There’s an existing thread about Geneva (the tool):