Crossposted from https://github.com/net4people/bbs/issues/13.
Improving Meek With Adversarial Techniques
Steven R. Sheffey, Ferrol Aderholdt
This paper is concerned with meek’s susceptibility to classification based on traffic flow analysis; i.e., packet sizes and packet timing. The authors collect their own traffic traces of browsing home pages with and without meek-with-Tor. They identify feature differences and demonstrate three classifiers that can distinguish ordinary HTTPS from meek HTTPS. They then show how minimal perturbation of the meek-derived feature vectors can hinder the classifiers.
To build a corpus of training and test data, they built a parallel data collection framework using Docker containers and a centralized work queue. They browsed 10,000 home pages both with a headless Firefox, and with Tor Browser configured to use its meek-azure bridge. They performed the test from three different networks—residential, university, and datacenter—yielding a total of 60,000 traffic traces. From these, they extract binned features: TCP payload length, and interarrival times tagged with direction (upstream or downstream). Their packet length distribution differs from the one reported in the 2015 domain fronting paper; the authors speculate that could be because of differences in source data, or changes to meek that have happened in the meantime.
They then use a GAN (generative adversarial network), specifically the StarGAN implementation, to iteratively transform a meek feature vector so that it looks more like a ordinary HTTPS feature vector. The transformation process tries to minimize the size of changes required, by including a perturbation loss term that increases as more changes are required. Minimizing perturbation is to make it easier to implement the resulting distribution, while still fooling the classifiers.
The data collection framework and analysis scripts are published at https://github.com/starfys/packet_captor_sakura.