Paper summary: A First Look at Private Communications in Video Games using Visual Features (PETS 2021)

A First Look at Private Communications in Video Games using Visual Features
Abdul Wajid, Nasir Kamal, Muhammad Sharjeel, Raaez Muhammad Sheikh, Huzaifah Bin Wasim, Muhammad Hashir Ali, Wajahat Hussain, Syed Taha Ali, Latif Anjum

This paper explores the potential of sending covert messages in video games, by writing text using in-game objects such as bullet holes or blocks. The intuition behind it is that while messages written in the visual world of the game can be easily read by humans, such messages exist at a level that is relatively inaccessible to automated detection and analysis. Unlike, for example, in-game text chat, it is not easy for even the game server to tell when a player is arranging objects to spell a message. Off-the-shelf text spotting software is poor at detecting messages written in games, and even if trained specifically on game screenshots, do not generalize well to other games or situations.

A large part of the paper is devoted to the construction and evaluation of the GameText data set, which consists of screenshots of in-game text in three games: Grand Theft Auto V, Call of Duty 4, and Minecraft. Modeled after the Street View Text data set, it contains screenshots of 350 words, created in each of the three games, and seen from three angles. They test the accuracy on this data set of four text spotting engines: DeepTextSpotter, ABCNet, Google Cloud Vision, and Azure Cloud Vision. By default, none of the engines does well on GameText images, getting less than 1% accuracy in most cases (Table 2). After a few hours of specialized retraining, DeepTextSpotter and ABCNet improve in accuracy, up to a maximum of 65% in the case of ABCNet on GTA V (Table 3), though cross-game accuracy is worse. The paper considers other potential ways an adversary might try to detect in-game writing, such as by analyzing network traffic patterns.

The manual process of writing text in a game results in an ultra-low bitrate, on the order of 1 bit/s, which means it is only usable for quite short texts, or for distributing bootstrapping information like proxy addresses. Though they are also based on video games, the research of this paper is qualitatively different from past works such as Rook (overwrites certain fields in network packets), Castle (encodes information as in-game actions), and FPSCC (encodes information in slight perturbations to user inputs).

Thanks to the authors for reviewing a draft of this summary.