Meteor: Cryptographically Secure Steganography for Realistic Distributions
Gabriel Kaptchuk, Tushar M. Jois, Matthew Green, Aviel D. Rubin
Meteor is a symmetric-key steganography protocol. A sender and receiver, who share a secret key, can exchange messages that conform to a certain text generation model, and that also encode another, hidden message. An observer that does not know the secret key cannot distinguish the steganographically encoded messages from any other output of the text generation model, even if the observer knows the model. Meteor naturally adapts its encoding rate according to the local entropy of the text generation model, which is an advantage over past schemes that can fail when starved of entropy. The authors show how to use Meteor with sophisticated generative models of natural language text, like GPT-2, to provide realistic covertexts.
The text generation models that are compatible with Meteor produce text one word at a time. Given a context of words that have been output so far, the model finds a probability distribution of candidates for the next word; i.e., a set of words with associated weights, where the weights sum to 1.0. The model then makes a weighted random selection from among the candidates. Suppose, at a certain point in text generation, the probability distribution for the next word is 40% little, 20% gray, 20% happy, and 20% nimble. Stack up those probabilities, in some defined order, to assign each word an interval of the numbers between 0.0 and 1.0. Sample a random number r between 0.0 and 1.0, and output the word whose interval contains r. In this case, for example, if the random number generator produced r = 0.626, the model would output the word happy; if r = 0.187, it would output little.
|little||0.0 ≤ r < 0.4|
|gray||0.4 ≤ r < 0.6|
|happy||0.6 ≤ r < 0.8|
|nimble||0.8 ≤ r < 1.0|
Meteor replaces the text generation model’s usual random number generator with pseudorandom bits from a ciphertext of the message. Instead of an interval of real numbers, assign each candidate word a range of consecutive bit strings, in proportion to its probability:
0000 little 0001 little 0010 little 0011 little 0100 little 0101 little 0110 gray 0111 gray 1000 gray 1001 happy 1010 happy 1011 happy 1100 nimble 1101 nimble 1110 nimble 1111 nimble
The length of the bit strings is a parameter, β. In this example, β = 4; the authors report using β = 32 in practice. Suppose the sender wants to steganographically encode the message
01011000. The sender and receiver share a symmetric key that they use to produce a pseudorandom bitstream, for example
0110011011..., which we will call the mask. The sender takes β bits from the message (
0101) and β bits from the mask (
0110), and XORs them together (in the manner of a stream cipher) to produce a bit string r =
0011. The sender consults the probability table, sees that
0011 maps to little, and therefore appends little to the encoded message.
When the receiver observes the word little, it does a reverse lookup in the probability table, to find out what values of r could have mapped to that word. In this case, r might have been any of the six values from
0101. The receiver cannot know exactly which of these values the sender used, but what it does know is that the first bit of r must have been
0. This is because
0 is the longest common prefix of all the possible r values for little. The receiver XORs that known prefix of r (
0) with the first bit of its locally computed mask (
0) to get one bit of the decoded message,
The common prefix is not always one bit long. If the word had been happy, for example, the common prefix would have been
10, and the receiver would have been able to decode two bits of the message. On the other hand, the word gray would not encode any bits: some of its r values begin with
0 and some begin with
1, so their common prefix is empty. This is an example of Meteor’s natural adaptation to varying entropy: low-probability words map to few bit strings, and a long common prefix; high-probability words map to many bit strings, and a short (or empty) common prefix.
Because the sender and the receiver have the same text model, the sender also knows the range of possible r values for the word it has just output, and therefore the sender knows how many bits the receiver will have decoded from little. In this case it’s one bit. So the sender strips one bit from its message buffer, leaving
1011000 as the remaining message still to be encoded. The sender also discards the β used bits of the mask, leaving
011011.... The value of r for the next word to be output (which will be chosen from a different set of words, with a different probability distribution) will be
1101. The process repeats until all message bits have been encoded.
English text is not the only possible source of covertexts. Anything that satisfies the token-at-a-time generation model can be used—conceivably even things like network protocols. Besides GPT-2, which is word-oriented, the authors applied Meteor to character-oriented models trained on Wikipedia articles and HTTP headers. There is a comparison of the space and time costs of encoding in Table 3. You can see examples of Meteor stegotexts in Appendix C. The GPT-2 model encodes 160 message bytes into about 300–350 words, not counting the initial context. The following is an example of Meteor encoding using GPT-2:
The picture in The Pale I HCR scientists’ discussion now spans three dimensions. The first importance of the Yucatan Peninsula is demonstrated with the following conclusion: the Pliocene Earth has lost about seven times as much vegetation as the Jurassic in regular parts of the globe…