rjpeg: Exact JPEG recompression

As part of my PhD research, I developed a tool which inverts the computational steps of the Independent JPEG Group's JPEG decompressor version 6b. The tool maps an input image onto the set of bitstreams that produce it on decompression. If the set is empty, it indicates regions that are inconsistent with JPEG decompression.

This page contains information about my JPEG exact recompressor implementation. If you would like to get the source code, please read the instructions below then click here to download rjpeg-0.8.tar.gz.

rjpeg: Exact JPEG recompressor (version 0.8)

Introduction

In our paper 'Exact JPEG recompression' (Andrew B. Lewis and Markus G. Kuhn) we presented a technique for calculating the JPEG bitstream(s) which produce a particular uncompressed image given as input.

For full details of the algorithm, see our paper: http://www.cl.cam.ac.uk/~abl26/spie10-full.pdf

For an overview of the algorithm, a poster is also available: http://www.cl.cam.ac.uk/~abl26/spie10-poster.pdf

This archive contains the source code for the recompressor implementation described and evaluated in the paper.

Please note that this software is experimental and should not be used in production software. Error checking is missing, some debugging code is included in the source and the code has not been tested/optimized thoroughly.

See LICENSE for licensing information.

If you find this software useful, I would be grateful to receive an email describing how you have used it (andrew.lewis at cl.cam.ac.uk). If you would like to refer to it in an academic publication, please cite our paper:

@conference{lewis:75430V,
  author = {Andrew B. Lewis and Markus G. Kuhn},
  title = {Exact JPEG recompression},
  publisher = {SPIE},
  year = {2010},
  journal = {Visual Information Processing and Communication},
  volume = {7543},
  number = {1},
  eid = {75430V},
  numpages = {9},
  pages = {75430V},
  location = {San Jose, California, USA},
  url = {http://link.aip.org/link/?PSI/7543/75430V/1},
  doi = {10.1117/12.838878}
}

Please send any bug reports, queries, suggestions or patches to andrew.lewis at cl.cam.ac.uk.

Archive contents

README, LICENSE, Makefile: Makefile has targets for the main application (rjpeg) and a version for use with the Condor distributed computing system (www.cs.wisc.edu/condor)
rjpeg.h, rjpeg.c: main(...) function
data.h, data.c: Data types for pixel data, sets, intervals and expression trees
computations.h, computations.c: Rearranges expression trees for chroma smoothing
cspace.h, cspace.c: Inverts the colour space conversion
diagnosticinformation.h, diagnosticinformation.c: Functions to output infeasible block information
fdctislow.h, fdctislow.c: IJG forward DCT (`slow', integer)
forwardoperations.h, forwardoperations.c: Searching and filtering of blocks of quantized coefficient intervals
jpegout.c, jpegout.h: Use libjpeg to output JPEG bitstreams
quantize.c, quantize.h: Calculate possible quantization matrices and apply quantization
reverseidct.c, reverseidct.h: Reverse the decompressor IDCT using libgmp arbitrary precision arithmetic
solver.c, solver.h: Apply the chroma unsmoothing algorithm
unsmooth.c, unsmooth.h: Generate the expression tress for chroma smoothing

Input requirements

rjpeg takes a file in PPM P6 (binary 24 bits/pixel) format. (Multiple files can be specified and are processed separately.)

rjpeg will run to completion if the input image was output by a process equivalent to applying the IJG djpeg algorithm on an JPEG bitstream with the following characeristics:

the image was encoded with chroma sub-sampling (4:2:0);
the stored colour space is YCbCr; and
the image's width and height are both multiples of sixteen (twice the DCT block size).

Note that these are cjpeg defaults. Also, the decompressor IDCT must be equivalent to the IJG integer 'slow' transform (the default).

If these conditions are not met, rjpeg will output an error message when it encouters an inconsistency. I plan to add more helpful diagnostic information in a later version, to make rjpeg more useful in forensic situations.

Performance

rjpeg stores a 128 MB look-up table for colour space conversion on the disk. By default this is kept in /tmp/ycc_rgb_table but its location can be altered in cspace.h. This is generated whenever that files does not exist, so the program will take longer to execute the first time you run it.

Approximate time/space requirements: 512 by 512 images at qualities 90 and below typically take a few minutes to recompress on my machine, proportional to the quality factor and number of saturated pixels. The maximum memory usage was around 500 MB.

I have not yet tried to optimize speed and memory usage, and there are many opportunities to do so.

How to use

You will need these headers and libraries:

The IJG library, used to create output bitstreams (libjpeg).
The GMP arbitrary precision arithmetic library (libgmp).

You may wish to update the following constants:

The filename used to store the colour space conversion table, in cspace.h: #define INVERSE_YCC_RGB_TABLE_FILE_NAME "/tmp/ycc_rgb_table"

The default will produce -rw-rw-r-- 128M /tmp/ycc_rgb_table (An alternative string constant is used in the Condor target.)

The number of quantized DCT coefficient block candidates beyond which the exhaustive search step is considered infeasible, in forwardoperations.hstatic const UINT64 possibilities_limit = (1L << 20);

The upper and lower limits for results of the IDCT, used when inverting the range clipping operation, in reverseidct.h #define RANGE_LIMITING_UPPER_LIMIT 288, #define RANGE_LIMITING_LOWER_LIMIT -35. Making these values further from 0 or 255 will recompress more images with black or white areas correctly at the expense of increased search sizes.

Run 'make' to produce the executable.

I have included a compressed test image from the UCID http://www-staff.lboro.ac.uk/~cogs/datasets/UCID/ucid.html.

Example usage:

$ tar -zxvf rjpeg-0.8.tar.gz
$ make
$ djpeg -outfile example1.ppm example1.jpg
$ ./rjpeg example1.ppm
QF: 40
Y: 3072 exact 0 ambiguous 0 infeasible 0 impossible
Cb: 768 exact 0 ambiguous 0 infeasible 0 impossible
Cr: 768 exact 0 ambiguous 0 infeasible 0 impossible
$ diff example1.ppm.result.jpg example1.jpg

The last command should output nothing, indicating that the files are binary identical.