Posted by on 30 Jul 2010 in Reversing, Security, Software | 1 comment

I was pointed to the Cyber Security Challenge earlier this week, and eventually stumbled upon the cipher they offer as an “immediate opportunity to test your skills”. Despite not really understanding the point of the exercise or how it related to the other competitions mentioned, I was pleased to see that the “ciphertext” was right there on the site with no tedious registration procedure required. I decided to give it a go; this is the procedure I followed to work it out.

Stage 1

I quickly recognised the starting text as probably being Base64 encoded. 1 If I hadn’t recognised it, counting the number of unique symbols used would probably have led to the same conclusion. The GNU base64 program decoded it, and a quick look at the resulting file left me slightly surprised to see the familiar “JFIF” text indicating I now had a JPEG image. 2

Decoded comic

Decoded Comic

Smiling at the usage of that particular comic, I thought “Oh, was that it?” … then I noticed the strange looking border. Including the URL of the original comic was a nice touch for those who might not otherwise notice the alteration.

Stage 2

The border of the image looked very much like a sequence of bits to me, pixels set to black or white to indicate a 1 or 0. I noted that the lower pixel in the row was always the inverse of the top one, so really there’s only one bit per column. The pixels – presumably due to the JPEG compression – aren’t actually always pure black or white but they seemed close enough 3 for that to be the intention. I wasn’t about to copy them down by hand, so I wrote a quick Python (and PIL) script to do so for me.

import Image

# Read the bits encoded as pixels around the border of an image.
# They are ordered in a clockwise fashion, in the order shown below:
#
#   1
#   --->
#   ^  | 2
# 4 |__|
#     3
#
# Note that the order changes from left-to-right to right-to left from
# 2 to 3, effectively reading the outermost bits in a clockwise rotation.
# For this reason, the bit strings returned for 3 and 4 are reversed
# before decoding to ASCII.

def get_border_bits(filename):
    im = Image.open(filename)
    pels = im.load()
    (w, h) = im.size

    bits = ""
    bits += get_block_h(pels, 0,     0,     w)
    bits += get_block_v(pels, w - 1, 3,     h - 6)
    bits += get_block_h(pels, 0,     h - 1, w)[::-1]
    bits += get_block_v(pels, 0,     3,     h - 6)[::-1]

    out = bins_to_str(bits)
    print out
    print out.encode("rot13")

# Given a string of binary digits ("010111....") return a string of
# ASCII characters corresponding to each eight bits, little endian.
def bins_to_str(bits):
    bstr = ""

    blen = len(bits)
    for i in xrange(0, blen, 8):
        bin = bits[i:i + 8]
        val = int(bin, 2)

        #print "%-8s 0x%02X %c" % (bin, val, val)
        bstr += chr(val)

    return bstr

# get_block for a horizontal group of bits
def get_block_h(pels, x, y, count):
    return get_block(pels, x, y, count, True)

# get_block for a vertical group of bits
def get_block_v(pels, x, y, count):
    return get_block(pels, x, y, count, False)

# Get string of binary digits from the pixels in the given range
def get_block(pixels, x, y, count, horiz):
    bits = ""
    for i in xrange(0, count):
        if horiz:
            b = pixels[i + x, 0 + y]
        else:
            b = pixels[0 + x, i + y]

        bits += '1' if is_black(b) else '0'
    return bits

# Returns if the given pixel (tuple) is black (set)
def is_black(p):
    if p[0] < 50:
        return True
    return False

As the comment at the top of the code explains, I read the bits clockwise, reading the outermost set of them. Reading the top row of bits (01000011011110010111001001101110…) and treating them as 8-bit ASCII characters produced the following:

Cyrnfr sbyybj guvf yvax:      uggcf://plore

After double checking the output to make sure it wasn’t producing complete gibberish, I noticed that “uggcf://” looked suspiciously like a URL scheme. The obvious one the same length as “uggcf” is “https” and the two Ts matching up with the two Gs seemed like too much of a coincidence. I decided it was likely to be a simple substitution cipher, “uggcf://plore” was probably “https://cyber” (the start of the challenge’s web address).

Something was ringing an embarrassingly faint bell at this point, but it wasn’t until I’d noted that ‘u’ comes 13 characters after ‘h’ in the alphabet and that ‘g’ is 13 characters before ‘t’ that I realised that it was Caesar cipher – more specifically ROT13. The alphabet is simply rotated by 13 characters, leaving non-letter characters untouched.

It’s quite a common cipher, so it was easy to use an existing tool to rotate the alphabet back again:

Please follow this link:      https://cyber

That certainly looked promising!

Earlier I mentioned that I read the outermost bits from the image in a clockwise fashion. That wasn’t a decision I arrived at by chance, I’d like to think that it was a clever bit of puzzle design. After decoding that first row, I made an educated guess that the next few characters would continue the URL – “cybersecuritychallenge”. That meant that I knew what bits to expect, and hence where to look on the image.

Between the four groups of bits in the image are white gaps. The fact that no inverse colour was present indicated that these were to be skipped rather than read as a 0.

As I worked my way around the borders of the image, the same property held. “Knowing” what bits were needed to continue the sequence helped when working out that the bits were arranged in a clockwise fashion. The following lines show the message (after rotating it) continuing round to the right hand side, along the bottom and then up towards the starting position in the top left.

Please follow this link:      https://cybersecuritychallenge.org
Please follow this link:      https://cybersecuritychallenge.org.uk/834jtp.html https://cybersecuritychallen
Please follow this link:      https://cybersecuritychallenge.org.uk/834jtp.html https://cybersecuritychallenge.org.uk/834jtp.html

Of course, I followed the link…

Stage 3

The last stage starts with a long sequence of hexadecimal digits. I copied and pasted the lot through a tiny Python script (via binascii.unhexlify) to convert it into a binary file and examined the output with a hex editor. The only thing immediately apparent was that there appeared to be a pattern to the data – lots of 0xEC and 0xED and so on.

I’d be interested to know how other people solved this bit – perhaps by counting the frequencies of different characters – but I tried a very simple version of a “known plaintext” attack. I had a strong suspicion that “cybersecuritychallenge” would appear somewhere in the encoded content – either as part of a web address or email address.

I wrote another Python script to quickly iterate through the new file and look for a pattern that might correspond to that plain text. Essentially I looked for a pattern like this:

cybersecuritychallenge
1  2  21     1  332  2

That is, read one character and if the character seven characters later is the same one, and the sixth after that is the same value, then we’ve probably found what ‘c’ corresponds to. Likewise for the ‘e’ and ‘l’ as shown above. When I tried it, that pattern matched once in the file. Success!

I now knew it was probably another substitution cipher, and could map every character in “cybersecuritychallenge” to their equivalents:

a b c d e f g h i j k l m
2C 4C 6C __ AC __ EC 0D 2D __ __ 8D __
n o p q r s t u v w x y z
CD __ __ __ 4E 6E 8E AE __ __ __ 2F __

It was easy enough to fill in the missing values for the alphabet, but doing so for the rest of the character set would be tedious. It wasn’t until I looked at the binary values for the characters I noticed the pattern:

For 'a' (ascii 0x61, encoded as 0x2C):

0x2C 00101100
0x61    01100001

For 'b' (ascii 0x61, encoded as 0x4C):

0x4C 01001100
0x62    01100010

My ASCII-art diagram probably isn’t very helpful – but the bits in each individual byte of the message have been rotated by three positions, wrapping back around to the left. I wrote a very quick and ugly Python script to complete the rest of the alphabet, resulting in the following message:

Congratulations ? you?ve found and completed the REAL challenge. Your win code is xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.

Please email this code to our team at [email protected] If you?re the first person to do so, and can prove you meet the eligibility criteria (British citizen currently resident in the UK) we will be in touch to advise how to claim your prize. Well done and good luck in the Cyber Security Challenge competitions taking place throughout the rest of the year.

In the decoded output, the question marks show where the decoded output wasn’t a printable character. From the context I can guess what they’re supposed to be, but I didn’t (and still don’t) see how they fit the pattern. Luckily they weren’t necessary to decode the “win code” from the message, which I have omitted from the plaintext above – though I’m sure it’s all over the web by now.

I’m going to write a separate post on my impressions of the challenge, as this one is already comfortably the longest I’ve ever written!

Notes:

  1. There are a number of spaces present in the text file before the content begins. They don’t appear to have a purpose, other than perhaps drawing attention to the space character (ASCII 0×20) that’s relevant later.
  2. The image also has an Exif header which has been zeroed out, at a guess, either by hand or with a simple tool. Why not just remove it altogether?
  3. I picked an arbitrary value of “50″ for or less in the red component to as meaning “black”, with every intention of improving it – turns out what wasn’t necessary.