Running Refraction Networking for Real
Benjamin VanderSloot, Sergey Frolov, Jack Wampler, Sze Chuen Tan, Irv Simpson, Michalis Kallitsis, J. Alex Halderman, Nikita Borisov, Eric Wustrow
The paper reflects on one year of running a refraction networking system with real users at a real ISP. Following a one-month pilot deployment in 2017 (“An ISP-Scale Deployment of TapDance”), the team began operation in earnest in October 2018. The deployment used TapDance as the refraction networking system, with the client base being a select subset of Psiphon users. The cooperating ISP was Merit Network.
The original TapDance paper considered a simplified ISP model, where a single refraction networking station could cover all of an ISP’s traffic. In reality, an ISP has multiple uplinks, and therefore a single station does not suffice; multiple stations must work together as a distributed system. To make this work, the authors divided the stations’ responsibilities between multiple independent detectors and a single centralized proxy. The detectors search TLS traffic for TapDance tags, and forward only the matching flows to the proxy. (This division of responsibilities is similar to the split between controllers/switches and the secret proxy in SiegeBreaker.) A client prefixes each of its decoy flows with a randomly generated session identifier. The centralized proxy stitches together flows that share the same session identifier to form one long-term session. This kind of identifier-based multiplexing is necessary because (for technical reasons) a single TapDance flow can only be used for a limited time before it must be discarded, and because packets in a flow may pass by different detectors, which do not share state with each other.
The deployment supported about 33,000 unique users per month, with a peak goodput of 500 Mbps. There were four detectors installed at the ISP, which collectively processed 5,000–20,000 TLS flows per second. The cost of hardware for the detectors and proxy was about $30,000. The estimated cost of rack space and bandwidth, if not donated, as about $35,000 per year. About 559,000 Psiphon clients were TapDance-capable, but because Psiphon is a multi-modal circumvention system that automatically selects the transport with the lowest latency, not all those clients used TapDance all the time. Psiphon’s adaptive transport selection caused an interesting effect during a censorship event: under normal conditions, TapDance-capable clients used TapDance about 10% of the time, but when some of Psiphon’s other transports became temporarily blocked, the fraction rose to 40%. Backend infrastructure that used port scanning to find eligible decoy sites: there were about 3,000 hosts behind the ISP that returned a TLS certificate on port 443, but the number of usable decoys dropped to about 1,500 after filtering for TCP window sizes, TLS ciphers, and other TapDance requirements. On average, a client would make one failed decoy connection before finding one that worked.
Section 5.3 has a nice discussion of ISP concerns related to deploying refraction networking. Each station was permitted 1U of rack space. The installation could not interfere with any of the ISP’s normal operations—and in particular, the failure of a station could not disrupt any of the ISP’s other duties. (TapDance, of course, was developed for the purpose of working without blocking normal packet flows.) The authors observe that the success of refraction networking “depends on close interactions between the Internet operator and Internet freedom communities.”
Thanks to Eric Wustrow for commenting on a draft of this summary.