The post
Hiding Zipped File Under Jpg Image showed the steps to achieve data hiding in jpg. Since Ego was questioning the theory behind this, i decided to get my hands dirty and find the answer. To understand this, we need to understand the data structures of jpg image and zip files.
Lets bisect the jpg image first.
Jpg Header Format:
Start of Image (SOI) marker -- two bytes (FFD8)
JFIF marker (FFE0)
* length -- two bytes
* identifier -- five bytes: 4A, 46, 49, 46, 00 (the ASCII code equivalent of a zero terminated "JFIF" string)
* version -- two bytes: often 01, 02
o the most significant byte is used for major revisions
o the least significant byte for minor revisions
* units -- one byte: Units for the X and Y densities
o 0 => no units, X and Y specify the pixel aspect ratio
o 1 => X and Y are dots per inch
o 2 => X and Y are dots per cm
* Xdensity -- two bytes
* Ydensity -- two bytes
* Xthumbnail -- one byte: 0 = no thumbnail
* Ythumbnail -- one byte: 0 = no thumbnail
* (RGB)n -- 3n bytes: packed (24-bit) RGB values for the thumbnail pixels, n = Xthumbnail *
Ythumbnail
The bold words in the above header is of importance to us. The 4 byte value consisting of SOI and JFIF marker. This signifies the starting of the jpg image. Any standard image viewer searches the file for "d8ff e0ff" (little endian mode) pattern. Once this of found, marks the start of the jpg image. The end of the jpg image is marked with "0xd9ff" (little endian mode). A cat on the image is going to make sure that some data is written after 0xd9ff there by making it unnecessary for any image viewer to bother about data after 0xd9ff.
Lets look at the zip header format.
Overall .ZIP file format:
[local file header 1]
[file data 1]
[data descriptor 1]
.
.
.
[local file header n]
[file data n]
[data descriptor n]
[archive decryption header]
[archive extra data record]
[central directory]
[zip64 end of central directory record]
[zip64 end of central directory locator]
[end of central directory record]
The one that concerns us is local file header
Local file header:
local file header signature 4 bytes (0x04034b50)
version needed to extract 2 bytes
general purpose bit flag 2 bytes
compression method 2 bytes
last mod file time 2 bytes
last mod file date 2 bytes
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
file name length 2 bytes
extra field length 2 bytes
file name (variable size)
extra field (variable size)
As seen in the bold letters is the signature of the start of the zip file. So the unzip program tries to find the above pattern in the file and assumes that the rest of the file till "end of central dir record" is reached. This explains why tar.gz or tar.bz2 files don't work while zip does. In other words, the gz/bz2 formats look for starting 4 bytes as identifiers and if not found will quit immediately.
The following example will illustrate the file layout of the various file formats.
Example: Generated using hexdumpImage file (jpg):
0000000 d8ff e0ff 1000 464a 4649 0100 0001 0100
0000010 0100 0000 dbff 8400 1000 0c0b 0c0e 100a
.
.
0005b50 4792 d9ff
0005b54
As discussed, the hex value in bold indicates the start of the jpg file. Now lets look at the zip file.
Zip file (.zip):
0000000 4b50 0403 0014 0000 0008 776b 3a41 d8d9
0000010 00d8 1109 000c 2c00 000d 0009 0015 6f77
.
.
00c1190 0100 0100 4400 0000 4500 0c11 0000 0000
00c119f
After the concatenation, the file now consists of both jpg and zip content as shown below.
Embedded Image File (jpg):
0000000 d8ff e0ff 1000 464a 4649 0100 0001 0100
0000010 0100 0000 dbff 8400 1000 0c0b 0c0e 100a
.
.
.
0005b50 4792 d9ff 4b50 0403 0014 0000 0008 776b
0005b60 3a41 d8d9 00d8 1109 000c 2c00 000d 0009
.
.
00c6ce0 0006 0000 0100 0100 4400 0000 4500 0c11
00c6cf0 0000 0000
00c6cf3
This little example must be able to clear out the doubts of how this works. Next step would be to manipulate the hex file to make zip program believe that jpg data is the zipped data. Stay tuned for more on this.