Balboa: Bobbing and Weaving around Network Censorship
Marc B. Rosen, James Parker, Alex J. Malozemoff
https://censorbib.nymity.ch/#Rosen2021a
Presentation video and slides
Balboa is a framework for link obfuscation that is in the same vein as Slitheen and Protozoa. The goal of all these systems is to embed a hidden communications channel inside some other network flow, without changing any externally observable features of the flow, particularly the traffic analysis features of packet sizes and packet timings. Balboa, like the others, works by traffic replacement: it removes some encrypted portions of the carrier flow and replaces them with identically sized encryptions of covert data. Balboa assumes TLS for the carrier flow (Slitheen also used TLS, Protozoa used WebRTC video). It has some unique advantages: by hooking into TLS libraries and intercepting networking system calls, it can use unmodified application binaries at both ends; and it undoes its traffic replacement before passing decrypted TLS payloads to upper network layers, which means the application programs behave identically to how they would behave in the absence of traffic replacement. The authors provide two instantiations of system, one that uses an Icecast audio stream, and one that works over HTTPS web browsing.
The most significant difference in Balboa is its use of a preshared traffic model. Client and server are assumed not only to share a symmetric key, but also to know in advance some portion of what the application program on the other side will send through the tunnel—this is the portion that is eligible for traffic replacement. For example, in an audio streaming setting, the Balboa client may already have a copy of some of the audio files that the server will later stream. The Balboa server also knows in advance what audio files the client has. When the Balboa server would stream one of the files the client already has, it instead replaces (under TLS encryption) the file’s contents with a pointer into the traffic model (“for the next file, substitute ‘Metallica - Fuel.ogg’”), followed by covert data for the remainder of the file size. The Balboa client sees and interprets the traffic model pointer, and re-substitutes its local copy of the file (i.e., the very same bytes the client would have received from the server if there had been no traffic replacement) into the data stream that it passes up the network stack, meanwhile saving the covert data somewhere else. Covert data is sent only when an application program would be sending data anyway, and only when what is being sent is part of the shared traffic model. The traffic model is what enables both sides to “fill in the gaps” that traffic replacement creates in the data stream. In the authors’ two instantiations of Balboa, the traffic model is a full copy of files to be sent later (in order to send N bytes of covert data, the peers need to have preshared N bytes’ worth of files), but one can imagine a procedural, or other more concise representation of a traffic model (Section 2.1).
A considerable part of the paper is devoted to showing how various engineering challenges were overcome. Balboa sets the SSLKEYLOGFILE
environment variable, or uses other means, to recover the keys necessary for decrypting and re-encrypting TLS application data records on the fly. But in order to work with unmodified applications while not disturbing packet boundaries, Balboa needs to hook into the network stack even below the TLS library, at the level of C library functions like read
and write
. This occasions considerable complexity, as Balboa needs to cope with TLS records that are not aligned with the buffers used by the low-level calls (Section 2.5). Balboa also needs to run its decryption and re-encryption quickly, because any small processing delays are potential distinguishers. But because of the way Balboa works, such delays are really the only leverage a censor has to distinguish flows.
Thanks to the authors for reviewing a draft of this summary.