The History and Background of RC4

RC4, also known as ARC4 (Alledged RC4), was developed by Ron Rivest at RSA security (he's the R ;) in 1987, and it was kept as a trade secret until 1994, when it was posted anonymously to the cypherpunks mailinglist. Two key weaknesses were discovered within a year.

RC4 is a stream cipher. This means it essentially functions as a random number generator whose output is XOR'ed against the plaintext stream.

The difference between using a well-designed stream cipher as opposed to rand() is that rand() is predictable in that rand() reveals its internal state directly in the output stream. This makes rand() vulnerable to known plaintext attacks. Well designed stream ciphers go through great pains to not give away internal state when they generate output. RC4 is resonably good at this. While there are attacks that can reconstruct the internal RC4 state from known-output, they are still do not have runtimes that make them feasable in practice. (Something on the order of 2^800).

RC4 is still in production use, most notably in SSL and 802.11 WEP.

Implementation

The RC4 Algorithm
KSA(K)
  for i = 0..N-1
    S[i] = i;
	
  j = 0;
  for i = 0..N-1
     j = j + S[i] + K[i % l];
     Swap(S[i], S[j]);
  
i = j = 0;
PRGA(S)
  i = i + 1;
  j = j + S[i];
  Swap(S[i], S[j]);
  return S[S[i] + S[j]];

As you can see, RC4 has two stages - a KSA, that initializes the state table to be a "random" permutation based on the key, and the PRGA, which actually returns a random byte. Notice how the PRGA does not return an easily determined index of the state table.

RC4 has two parameters: Key size and word size. The word size is almost always 8 bits, and the key size varies from 40 bits to 2word size.

The internal state table S is always 2word size long. S is a permutation table, which means that it contains exactly one of each number from 0...2word size-1.

The Attacks

Arthur Roos

The first weakness in RC4 was found by Arthur Roos, who noted that the key scheduling algorithm of RC4 didn't exactly yield a perfectly random distribution over the state table.

His analysis hinged on two key observations:

  1. It is highly likely that in the initial iterations of the KSA, S[i] = i
  2. It is relatively likely that once an index has been swapped by i, it will not be swapped again. The probability that an index is chosen "at random" by j is 1/256, so the probability that it won't be chosen is 255/256. So thus, the the probability that it won't be chosen every again during the KSA is (255/256)^255 (since if i = j, nothing happens), or 37%.

Given these two observations, it is then evident that immediately after the KSA runs, the most likely element in S[i] is Sum(K[i], O, i) + (i+1)/2, with probability (determined empirically and by computation):

 0-7       37.0  36.8  36.2  35.8  34.9  34.0  33.0  32.2
 8-15      30.9  29.8  28.5  27.5  26.0  24.5  22.9  21.6
 16-23     20.3  18.9  17.3  16.1  14.7  13.5  12.4  11.2
 24-31     10.1   9.0   8.2   7.4   6.4   5.7   5.1   4.4
 32-39      3.9   3.5   3.0   2.6   2.3   2.0   1.7   1.4             
 40-47      1.3   1.2   1.0   0.9   0.8   0.7   0.6   0.6     

He goes on to show that this yeilds a class of weak keys:

C. Given an RC4 key K[0]..K[N] with K[0] + K[1] == 0 (mod 256), there is a significant probability that the first byte generated by RC4 will be K[2] + 3 (mod 256).
  • Example
  • As a result, if you use RC4, it is highly recommended you discard the first several thousand outputs from the PRGA to get the state table into a more even distribution.

    Attacks against WEP

    Unfortunately (or fortunately, depending on your point of view), WEP does not discard the first outputs. Worse, the 802.11 protocol has several known bytes of plaintext in its LLC headers. Worse *still*, WEP resets RC4 for *each packet*. Worse *STILL* WEP prepends the RC4 key with an "Initialization Vector that is transmitted in plain text.

    As you might imagine, this is bad.

    Fluhrer, Mantin, and Shamir were the first to take advantage of this badness. Essentially, what they did was realize that if you have knowledge of the IV, and the IV is of a certain class (the most common is [B+3, 255, X]), then the most likely output is SB+2[jB+2 + K[B] + SB+2[B+3]] with 0.05 probability. Since this is > the expected 1/256, after you accumulate a sufficient number of weak IV's, you are able to employ a simple voting mechanism to pick the most likely value you have, and then perform some manipulations to find the Bth key byte. (K[B] = SB+2-1[output] - jB+2 - SB+2[B+3];

    This process begins with key byte 0, and is repeated, using the IV + key bytes 0..B-1 to find key byte B.

  • Example