Paper summary: Meteor: Cryptographically Secure Steganography for Realistic Distributions (CCS 2021)

Meteor: Cryptographically Secure Steganography for Realistic Distributions
Gabriel Kaptchuk, Tushar M. Jois, Matthew Green, Aviel D. Rubin

Meteor is a symmetric-key steganography protocol. A sender and receiver, who share a secret key, can exchange messages that conform to a certain text generation model, and that also encode another, hidden message. An observer that does not know the secret key cannot distinguish the steganographically encoded messages from any other output of the text generation model, even if the observer knows the model. Meteor naturally adapts its encoding rate according to the local entropy of the text generation model, which is an advantage over past schemes that can fail when starved of entropy. The authors show how to use Meteor with sophisticated generative models of natural language text, like GPT-2, to provide realistic covertexts.

The text generation models that are compatible with Meteor produce text one word at a time. Given a context of words that have been output so far, the model finds a probability distribution of candidates for the next word; i.e., a set of words with associated weights, where the weights sum to 1.0. The model then makes a weighted random selection from among the candidates. Suppose, at a certain point in text generation, the probability distribution for the next word is 40% little, 20% gray, 20% happy, and 20% nimble. Stack up those probabilities, in some defined order, to assign each word an interval of the numbers between 0.0 and 1.0. Sample a random number r between 0.0 and 1.0, and output the word whose interval contains r. In this case, for example, if the random number generator produced r = 0.626, the model would output the word happy; if r = 0.187, it would output little.

word interval
little 0.0 ≤ r < 0.4
gray 0.4 ≤ r < 0.6
happy 0.6 ≤ r < 0.8
nimble 0.8 ≤ r < 1.0

Meteor replaces the text generation model’s usual random number generator with pseudorandom bits from a ciphertext of the message. Instead of an interval of real numbers, assign each candidate word a range of consecutive bit strings, in proportion to its probability:

0000 little
0001 little
0010 little
0011 little
0100 little
0101 little
0110 gray
0111 gray
1000 gray
1001 happy
1010 happy
1011 happy
1100 nimble
1101 nimble
1110 nimble
1111 nimble

The length of the bit strings is a parameter, β. In this example, β = 4; the authors report using β = 32 in practice. Suppose the sender wants to steganographically encode the message 01011000. The sender and receiver share a symmetric key that they use to produce a pseudorandom bitstream, for example 0110011011..., which we will call the mask. The sender takes β bits from the message (0101) and β bits from the mask (0110), and XORs them together (in the manner of a stream cipher) to produce a bit string r = 0011. The sender consults the probability table, sees that 0011 maps to little, and therefore appends little to the encoded message.

When the receiver observes the word little, it does a reverse lookup in the probability table, to find out what values of r could have mapped to that word. In this case, r might have been any of the six values from 0000 to 0101. The receiver cannot know exactly which of these values the sender used, but what it does know is that the first bit of r must have been 0. This is because 0 is the longest common prefix of all the possible r values for little. The receiver XORs that known prefix of r (0) with the first bit of its locally computed mask (0) to get one bit of the decoded message, 0.

The common prefix is not always one bit long. If the word had been happy, for example, the common prefix would have been 10, and the receiver would have been able to decode two bits of the message. On the other hand, the word gray would not encode any bits: some of its r values begin with 0 and some begin with 1, so their common prefix is empty. This is an example of Meteor’s natural adaptation to varying entropy: low-probability words map to few bit strings, and a long common prefix; high-probability words map to many bit strings, and a short (or empty) common prefix.

Because the sender and the receiver have the same text model, the sender also knows the range of possible r values for the word it has just output, and therefore the sender knows how many bits the receiver will have decoded from little. In this case it’s one bit. So the sender strips one bit from its message buffer, leaving 1011000 as the remaining message still to be encoded. The sender also discards the β used bits of the mask, leaving 011011.... The value of r for the next word to be output (which will be chosen from a different set of words, with a different probability distribution) will be 1011 XOR 0110 = 1101. The process repeats until all message bits have been encoded.

English text is not the only possible source of covertexts. Anything that satisfies the token-at-a-time generation model can be used—conceivably even things like network protocols. Besides GPT-2, which is word-oriented, the authors applied Meteor to character-oriented models trained on Wikipedia articles and HTTP headers. There is a comparison of the space and time costs of encoding in Table 3. You can see examples of Meteor stegotexts in Appendix C. The GPT-2 model encodes 160 message bytes into about 300–350 words, not counting the initial context. The following is an example of Meteor encoding using GPT-2:

The picture in The Pale I HCR scientists’ discussion now spans three dimensions. The first importance of the Yucatan Peninsula is demonstrated with the following conclusion: the Pliocene Earth has lost about seven times as much vegetation as the Jurassic in regular parts of the globe…