rjpeg: Exact JPEG recompressor

This page contains information about my JPEG exact recompressor implementation. If you would like to get the source code, please read the instructions below then click here to download rjpeg-0.8.tar.gz.
README

rjpeg: Exact JPEG recompressor (version 0.8)
============================================

Contents
--------

- Introduction
- Archive contents
- Performance
- Input requirements
- How to use

Introduction
------------

In our paper 'Exact JPEG recompression' (Andrew B. Lewis and Markus G. Kuhn) we 
presented a technique for calculating the JPEG bitstream(s) which produce a 
particular uncompressed image given as input.

For full details of the algorithm, see our paper:

  http://www.cl.cam.ac.uk/~abl26/spie10-full.pdf
  
For an overview of the algorithm, a poster is also available:

  http://www.cl.cam.ac.uk/~abl26/spie10-poster.pdf
  
This archive contains the source code for the recompressor implementation 
described and evaluated in the paper.

Please note that this software is experimental and should not be used in 
production software. Error checking is missing, some debugging code is included 
in the source and the code has not been tested/optimized thoroughly.

See LICENSE for licensing information.

If you find this software useful, I would be grateful to receive an email 
describing how you have used it (andrew.lewis at cl.cam.ac.uk). If you would 
like to refer to it in an academic publication, please cite our paper:

  @conference{lewis:75430V,
    author = {Andrew B. Lewis and Markus G. Kuhn},
    title = {Exact JPEG recompression},
    publisher = {SPIE},
    year = {2010},
    journal = {Visual Information Processing and Communication},
    volume = {7543},
    number = {1},
    eid = {75430V},
    numpages = {9},
    pages = {75430V},
    location = {San Jose, California, USA},
    url = {http://link.aip.org/link/?PSI/7543/75430V/1},
    doi = {10.1117/12.838878}
  }

Please send any bug reports, queries, suggestions or patches to 

  andrew.lewis at cl.cam.ac.uk.

Archive contents
----------------

- README, LICENSE, Makefile
    Makefile has targets for the main application (rjpeg) and a version for use 
    with the Condor distributed computing system (www.cs.wisc.edu/condor)
- rjpeg.h, rjpeg.c
    main(...) function
- data.h, data.c
    Data types for pixel data, sets, intervals and expression trees
- computations.h, computations.c
    Rearranges expression trees for chroma smoothing
- cspace.h, cspace.c
    Inverts the colour space conversion
- diagnosticinformation.h, diagnosticinformation.c
    Functions to output infeasible block information
- fdctislow.h, fdctislow.c
    IJG forward DCT (`slow', integer)
- forwardoperations.h, forwardoperations.c
    Searching and filtering of blocks of quantized coefficient intervals
- jpegout.c, jpegout.h
    Use libjpeg to output JPEG bitstreams
- quantize.c, quantize.h
    Calculate possible quantization matrices and apply quantization
- reverseidct.c, reverseidct.h
    Reverse the decompressor IDCT using libgmp arbitrary precision arithmetic
- solver.c, solver.h
    Apply the chroma unsmoothing algorithm
- unsmooth.c, unsmooth.h
    Generate the expression tress for chroma smoothing

Input requirements
------------------

rjpeg takes a file in PPM P6 (binary 24 bits/pixel) format. (Multiple files
can be specified and are processed separately.)

rjpeg will run to completion if the input image was output by a process 
equivalent to applying the IJG djpeg algorithm on an JPEG bitstream with the 
following characeristics:

- the image was encoded with chroma sub-sampling (4:2:0);
- the stored colour space is YCbCr; and
- the image's width and height are both multiples of sixteen (twice the DCT 
  block size).
  
Note that these are cjpeg defaults. Also, the decompressor IDCT must be 
equivalent to the IJG integer 'slow' transform (the default).

If these conditions are not met, rjpeg will output an error message when it
encouters an inconsistency. I plan to add more helpful diagnostic information
in a later version, to make rjpeg more useful in forensic situations.

Performance
-----------

rjpeg stores a 128 MB look-up table for colour space conversion on the disk. By
default this is kept in /tmp/ycc_rgb_table but its location can be altered in
cspace.h. This is generated whenever that files does not exist, so the program
will take longer to execute the first time you run it.

Approximate time/space requirements: 512 by 512 images at qualities 90 and below 
typically take a few minutes to recompress on my machine, proportional to the 
quality factor and number of saturated pixels. The maximum memory usage was 
around 500 MB.

I have not yet tried to optimize speed and memory usage, and there are many
opportunities to do so.

How to use
----------

You will need these headers and libraries for

- The IJG library, used to create output bitstreams (libjpeg).
- The GMP arbitrary precision arithmetic library (libgmp).

You may wish to update the following constants:

- The filename used to store the colour space conversion table, in cspace.h

    #define INVERSE_YCC_RGB_TABLE_FILE_NAME "/tmp/ycc_rgb_table"

  The default will produce

    -rw-rw-r-- 128M /tmp/ycc_rgb_table
  
  (An alternative string constant is used in the Condor target.)
  
- The number of quantized DCT coefficient block candidates beyond which the
  exhaustive search step is considered infeasible, in forwardoperations.h

    static const UINT64 possibilities_limit = (1L << 20);

- The upper and lower limits for results of the IDCT, used when inverting the
  range clipping operation, in reverseidct.h

    #define RANGE_LIMITING_UPPER_LIMIT 288
    #define RANGE_LIMITING_LOWER_LIMIT -35

  Making these values further from 0 or 255 will recompress more images with
  black or white areas correctly at the expense of increased search sizes.

Run 'make' to produce the executable.

I have included a compressed test image from the UCID 

  http://www-staff.lboro.ac.uk/~cogs/datasets/UCID/ucid.html

Example usage:

  $ tar -zxvf rjpeg-0.8.tar.gz
  $ make
  $ djpeg -outfile example1.ppm example1.jpg
  $ ./rjpeg example1.ppm
  QF: 40
  Y: 3072 exact 0 ambiguous 0 infeasible 0 impossible
  Cb: 768 exact 0 ambiguous 0 infeasible 0 impossible
  Cr: 768 exact 0 ambiguous 0 infeasible 0 impossible
  $ diff example1.ppm.result.jpg example1.jpg
  
The last command should output nothing, indicating that the files are binary 
identical.