<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.2">Jekyll</generator><link href="http://localhost:4000/feed.xml" rel="self" type="application/atom+xml" /><link href="http://localhost:4000/" rel="alternate" type="text/html" /><updated>2025-12-09T19:57:17+00:00</updated><id>http://localhost:4000/feed.xml</id><title type="html">Micrological</title><entry><title type="html">Camera identification using photo-response non-uniformity</title><link href="http://localhost:4000/2015/04/25/camera-identification.html" rel="alternate" type="text/html" title="Camera identification using photo-response non-uniformity" /><published>2015-04-25T00:00:00+01:00</published><updated>2015-04-25T00:00:00+01:00</updated><id>http://localhost:4000/2015/04/25/camera-identification</id><content type="html" xml:base="http://localhost:4000/2015/04/25/camera-identification.html"><![CDATA[<p>Digital camera sensors exhibit characteristic, systematic, per-pixel multiplicative noise. In limited circumstances, it is possible to identify whether a candidate sensor's noise signal is present in a test image.</p>

<p>The sensor noise signal can be estimated by <a href="https://github.com/andrewlewis/camera-id/blob/master/make_characteristic.py">averaging imperfect noise estimates from many images</a>. <a href="https://github.com/andrewlewis/camera-id/blob/master/test_characteristic.py">Correlation indicates how strongly the signal appears in a test image</a>.</p>

<p>There is more information on my <a href="https://github.com/andrewlewis/camera-id">GitHub page</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Digital camera sensors exhibit characteristic, systematic, per-pixel multiplicative noise. In limited circumstances, it is possible to identify whether a candidate sensor's noise signal is present in a test image.]]></summary></entry><entry><title type="html">CFAs in image forensics</title><link href="http://localhost:4000/2015/04/25/cfa.html" rel="alternate" type="text/html" title="CFAs in image forensics" /><published>2015-04-25T00:00:00+01:00</published><updated>2015-04-25T00:00:00+01:00</updated><id>http://localhost:4000/2015/04/25/cfa</id><content type="html" xml:base="http://localhost:4000/2015/04/25/cfa.html"><![CDATA[<p>Most digital cameras use a Bayer colour filter array to capture colour images. Each pixel's sensor captures only one
  colour of filtered light, and the colour filters are arranged in a periodic pattern over the sensor. As a
  post-processing step (in the camera firmware, or in a raw file converter, for example) the missing (filtered) colour
  components are interpolated from each pixel's neighbours.</p>

<p>An investigator can try to work out what interpolation method was used via a statistical analysis of the image's
  pixels. This can give an indication of what camera make/model or post-processing software was used to produce a
  untampered test image.</p>

<p>Areas of the image that are inconsistent with an expected interpolation pattern may have been tampered.</p>

<p>There is more information in my <a target="_self" href="/cl-web/pdf/acs10-cfa.pdf">slides on colour filter array
    interpolation detection</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Most digital cameras use a Bayer colour filter array to capture colour images. Each pixel's sensor captures only one colour of filtered light, and the colour filters are arranged in a periodic pattern over the sensor. As a post-processing step (in the camera firmware, or in a raw file converter, for example) the missing (filtered) colour components are interpolated from each pixel's neighbours.]]></summary></entry><entry><title type="html">JPEG compression history</title><link href="http://localhost:4000/2015/04/25/jpeg-history.html" rel="alternate" type="text/html" title="JPEG compression history" /><published>2015-04-25T00:00:00+01:00</published><updated>2015-04-25T00:00:00+01:00</updated><id>http://localhost:4000/2015/04/25/jpeg-history</id><content type="html" xml:base="http://localhost:4000/2015/04/25/jpeg-history.html"><![CDATA[<p>JPEG history analysis techniques aim to characterise the processing steps that might have led to a given test image
  being produced.</p>

<p>There is more information in my <a target="_self" href="/cl-web/pdf/acs10-jpeg.pdf">JPEG tutorial</a> and <a
    target="_self" href="/cl-web/pdf/acs09-ch.pdf">slides on JPEG compression history analysis</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[JPEG history analysis techniques aim to characterise the processing steps that might have led to a given test image being produced.]]></summary></entry><entry><title type="html">H.264 CABAC overview</title><link href="http://localhost:4000/2013/06/13/h264-cabac.html" rel="alternate" type="text/html" title="H.264 CABAC overview" /><published>2013-06-13T00:00:00+01:00</published><updated>2013-06-13T00:00:00+01:00</updated><id>http://localhost:4000/2013/06/13/h264-cabac</id><content type="html" xml:base="http://localhost:4000/2013/06/13/h264-cabac.html"><![CDATA[<p>For further information please see</p>
<ul>
<li><a href="http://ieeexplore.ieee.org/iel5/76/27384/01218195.pdf">Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard (Detlev Marpe, Heiko Schwarz and Thomas Wiegand)</a></li>
</ul>

<p>H.264/MPEG-4 AVC defines several different profiles, which specify which coding methods and parameters are allowed in a stream. The <i>Main</i> and <i>High</i> profiles allow the use of context adaptive binary arithmetic coding (CABAC), which offers improved compression performance (around 10% bit saving) compared to the context adaptive variable length coding (CAVLC) method which is available as an alternative, though it is more expensive to implement.</p>

<h2>Overview of the entropy coding process</h2>

<p>The entropy encoder takes as input a sequence of symbols representing samples and control information and maps this onto a binary bitstream which is output into the container. In contrast with earlier compression stages, the entropy coding is lossless; the decoder will reproduce the exact sequence of symbols which was input to the entropy encoder during compression.</p>

<p>H.264's implementation of CABAC creates the bitstream in three stages.</p>

<ol>
  <li>In <i>binarization</i>, each symbol to be output is uniquely mapped onto a binary string, called a <b>bin string</b>. Each bit position in the bin string is called a <b>bin</b>. Each bin is then passed to one of two coding modes: in <i>regular coding</i> mode, the next step, context modelling, is applied and the resulting context model and bin value are passed to the binary arithmetic coding engine; in <i>bypass</i> mode, context modelling is skipped and the bin is passed directly to a bypass coding engine, skipping the context modelling stage.</li>
  <li>In <i>context modelling</i> (only used for regular coding mode) a bin is categorised for coding under a particular probability model. Each probability model has its state represented by a context variable which is a pair (most probable symbol in {0, 1}, probability of less probable symbol). Arithmetic coding is applied using the chosen context model and updates its context variable.</li>
  <li>In <i>binary arithmetic coding</i> the value of the bin is used to update the context variable if applicable, and bits are output into the bitstream.</li>
</ol>

<h2>Binarization</h2>

<p>The input to this process is a symbol (syntax element) to be coded, such as a quantization transform coefficient, macroblock type specifier or a motion vector component. The mapping onto bin strings should be close to a minimum redundancy code.</p>

<h3>Main types of binarization</h3>
<p>Four main types of binarization are defined:</p>

<ul>
  <li><b>Unary code</b> &ndash; the value <i>x</i> &ge; 0 is mapped onto <i>x</i> <code>1</code> bits followed by a <code>0</code> bit.</li>
  <li><b>Truncated unary (TU) code</b> &ndash; the value 0 &le; <i>x</i> &le; <i>S</i> is coded with a unary code if <i>x</i> &lt; <i>S</i>, or <i>x</i> <code>1</code> bits otherwise. If the condition 0 &le; <i>x</i> &le; <i>S</i> holds, the truncated unary encoding of value <i>x</i> is given by
<pre><code>def tu(s, x):
  for i in range(0, min(s, x)):
    put(1)
  if x &lt; s:
    put(0)</code></pre></li>
  <li><b><i>k</i>th order Exp-Golomb (EGk) code</b> &ndash; the value <i>x</i> is mapped onto three sequential bit strings: a prefix, suffix and sign bit. The construction of a <i>k</i>th order Exp-Golomb code for value <i>x</i> is given by
<pre><code>def egk(k, x):
  while True:
    if x &gt;= (1 &lt;&lt; k):
      put(1) # bit of the prefix
      x = x - (1 &lt;&lt; k)
      k = k + 1
    else:
      put(0) # end of the prefix
      while k &gt; 0:
        k = k - 1
        put((x &gt;&gt; k) &amp; 0x01) # bit of the suffix
      break</code></pre>
  </li>
  <li><b>Fixed-length (FL) code</b> &ndash; the value <i>x</i> &lt; <i>S</i> is mapped onto its binary representation, using ceil(log<sub>2</sub><i>S</i>) bits.</li>
</ul>

<p>There are also five unstructed binary trees defined manually for coding macroblock and submacroblock types.</p>

<h3>Concatenated binarizations</h3>

<p>The codes can also be concatenated. There are three situations where concatenations of the four basic types are used:</p>

<ul>
  <li><code>coded_block_pattern</code> is encoded using a 4-bit FL prefix (for luma) and a TU suffix with <i>S</i> = 2 for chroma.</li>
  <li>Motion vector differences are encoded with a concatenation of a unary prefix and a 3rd order Exp-Golomb code suffix: for a value <i>mvd</i>, the prefix is a TU coding with <i>S</i> = 9 of the value min(|<i>mvd</i>|,9), or, if <i>mvd</i> = 0, just the bit <code>0</code>. If |<i>mvd</i>| &ge; 9, a suffix is output with the value |<i>mvd</i>| - 9 using the EG3 code. A sign bit is then output if |<i>mvd</i>| > 0: <code>0</code> if <i>mvd</i> is positive and <code>1</code> otherwise. The following code performs this coding, referencing the coding procedures for the main types of binarization above.
<pre><code>def uegk(s, k, x):
  absx = abs(x)
  tu(s, absx)
  absx = absx - 9
  if absx &gt;= 0:
    egk(k, absx)
  sgn(x)
  
def sgn(x):
  if x == 0:
    return
  if x &lt; 0:
    put(1)
  else:
    put(0)</code></pre>
  </li>
  <li>Absolute values of transform coefficient levels (<code>coeff_abs_value_minus1</code> = <code>abs_level</code> - 1 is coded, as the positions of zero-valued coefficients are specified in a map) are coded using a TU prefix with <i>S</i> = 14 and an EG0 suffix.</li>
</ul>

<h2>Context modelling</h2>

<p>The context modelling stage associates a context model with each bin output by the binarization stage.</p>

<p>There are four basic types of context modelling, which associate a probability with each bin based on previously coded values or other symbols in the neighbourhood:</p>

<ul>
  <li>Up to two neighbouring syntax elements are chosen based on the syntax element being coded, and the context model for the bin being coded is chosen based on the context model of the related bin in the neighbour syntax elements. For example, the context model of the related bin in the syntax elements in the above and left bins may be selected for the current bin.</li>
  <li>For the <code>mb_type</code> and <code>sub_mb_type</code> syntax elements, the model for a bin b<sub>i</sub> with prior coded bins (b<sub>0</sub>,b<sub>1</sub>, &hellip;, b<sub>i-1</sub>) is chosen based on those prior bin values.</li>
  <li>On residual data only, based on position in the scanning path</li>
  <li>On residual data only, based on number of encoded levels with a particular value prior to the current level bin being coded.</li>
</ul>

<p>The context modelling process only ever references past values within the same slice.</p>

<p>Each syntax element may use one of a range of models, each of which is denoted by a context index. The possible models for each syntax element are given in Table 9-11 of <a href="http://www.itu.int/rec/T-REC-H.264">the standard</a>, which specifies the allowable values for the context index &gamma; for each element. The range of allowed values of context index for <code>mb_type</code>, <code>sub_mb_type</code> and <code>mb_skip_flag</code> depends on the slice type being coded (SI/I, SP/P or B).</p>

<p>Each probability model (uniquely associated with a context index) consists of a pair of two values: a 6-bit probability state index &sigma;<sub>&gamma;</sub> and a single bit which is the most probable symbol (MPS). Each model is therefore represented by a 7-bit value.</p>

<p>Macroblock type, submacroblock type, spatial and temporal prediction modes, slice- and macroblock-based control information syntax elements all use context indices between 0 and 72. The context index is calculated as &gamma; = &Gamma;<sub><i>S</i></sub> + &chi;<sub><i>S</i></sub> where &Gamma;<sub><i>S</i></sub> is the context index offset, which is the lowest value in the allowable range for the syntax element's context index, and &chi;<sub><i>S</i></sub> is a context index increment, which specifies the offset within the range. &chi;<sub><i>S</i></sub> may either depend only on the bin index (giving a fixed assignment of probability model to each bin), or it may specify one of the first two context modelling types above.</p>

<p>Context indices in the range 73 to 398 are used for coding residual data (except for &gamma; = 276 which is associated with the end of slice flag).</p>

<p><code>significant_coeff_flag</code> and <code>last_significant_coeff_flag</code> use different models depending on whether they are in frame or field mode. Not all context models are used in frame-only/field-only pictures.</p>

<p>The model for <code>coded_block_pattern</code> is specified using &gamma; = &Gamma;<sub><i>S</i></sub> + &chi;<sub><i>S</i></sub>. All other syntax elements of residual data use the relation &gamma; = &Gamma;<sub><i>S</i></sub> + &Delta;<sub><i>S</i></sub>(<i>ctx_cat</i>) + &chi;<sub><i>S</i></sub>, where the context category dependent offset &Delta;<sub><i>S</i></sub>. Table 9-40 in the standard specifies the value of this offset, in terms of the context category which is given for each block type in Table 9-42.</p>

<h2>Binary arithmetic coding</h2>

<p>Arithmetic coding works by representing an interval within [0, 1] by two values: a lower bound <i>L</i> and a range <i>R</i>, and recursively subdividing this interval using the probability and value of each input bit: on reception of a more probably symbol (MPS), with probability <i>p</i><sub>MPS</sub>, the interval is updated to have width <i>R</i><sub>MPS</sub> := <i>R</i> &sdot; <i>p</i><sub>MPS</sub> (the corresponding operation for a less probable symbol would update the interval to have width <i>R</i><sub>LPS</sub> := <i>R</i> &sdot; <i>p</i><sub>LPS</sub> then update the lower bound <i>L</i> := <i>L</i> + <i>R</i> - <i>R</i><sub>LPS</sub>).</p>

<p>H.264/MPEG-4 AVC CABAC uses a modulo-coder (M coder) as its binary arithmetic coding implementation. It avoids the multiplication above by quantizing the value of the interval width <i>R</i> onto a small set of values <i>Q</i> = {<i>Q</i><sub>0</sub>, <i>Q</i><sub>1</sub>, &hellip;, <i>Q</i><sub><i>K</i> - 1</sub>}, and the probability range of the the less probable symbol <i>p</i><sub>LPS</sub> (0, 0.5] onto another set of values <i>P</i> = {<i>p</i><sub>0</sub>, <i>p</i><sub>1</sub>, &hellip;, <i>p</i><sub><i>N</i> - 1</sub>}. The tradeoff chosen for H.264 was <i>K</i> = 4 quantized range values and <i>N</i> = 64 probability vaulues.</p>

<p>For the bypass-mode coding engine, the probability estimation stage is omitted.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[For further information please see Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard (Detlev Marpe, Heiko Schwarz and Thomas Wiegand)]]></summary></entry><entry><title type="html">CFA detection</title><link href="http://localhost:4000/2013/05/05/cfa-detection.html" rel="alternate" type="text/html" title="CFA detection" /><published>2013-05-05T00:00:00+01:00</published><updated>2013-05-05T00:00:00+01:00</updated><id>http://localhost:4000/2013/05/05/cfa-detection</id><content type="html" xml:base="http://localhost:4000/2013/05/05/cfa-detection.html"><![CDATA[<p>Light reaching digital camera sensors (CCDs) is filtered by a colour filter array (CFA), or Bayer array, which allows each sensor element to measure the intensity of either red, green or blue light. The monochrome image captured by the CCD is converted into a full-colour image by interpolating the missing pair of RGB values at each sample position.</p>

<p>My <a href="http://www.cl.cam.ac.uk/~abl26/acs10-cfa.pdf">slides on CFA interpolation detection</a> include a more thorough introduction to the topic.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Light reaching digital camera sensors (CCDs) is filtered by a colour filter array (CFA), or Bayer array, which allows each sensor element to measure the intensity of either red, green or blue light. The monochrome image captured by the CCD is converted into a full-colour image by interpolating the missing pair of RGB values at each sample position.]]></summary></entry><entry><title type="html">Image forensics notes</title><link href="http://localhost:4000/2013/05/05/image-forensics.html" rel="alternate" type="text/html" title="Image forensics notes" /><published>2013-05-05T00:00:00+01:00</published><updated>2013-05-05T00:00:00+01:00</updated><id>http://localhost:4000/2013/05/05/image-forensics</id><content type="html" xml:base="http://localhost:4000/2013/05/05/image-forensics.html"><![CDATA[<p>This section contains sample code and information about several image forensics techniques, which analyse digital images to recover information about their origin and processing history.</p>

<p>The information here is based on <a href="http://www.cl.cam.ac.uk/~abl26/bibliography/">published work in the area</a>. I prepared some of the code for <a href="http://www.cl.cam.ac.uk/teaching/1011/R08/">Markus Kuhn's forensic signal analysis course</a> at the University of Cambridge Computer Laboratory, which I co-lectured in 2009 and 2010.</p>

<p>As part of a literature survey on multimedia forensics, I compiled a <a href="http://www.cl.cam.ac.uk/~abl26/bibliography/main.html">multimedia forensics bibliography</a>. The pages are generated using the Django web framework for python, using content from an SQLite database which I populate using a custom python script which parses a BibTeX file. If you would like to present your BibTeX bibliography in a similar way, contact me for source code.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[This section contains sample code and information about several image forensics techniques, which analyse digital images to recover information about their origin and processing history.]]></summary></entry><entry><title type="html">JavaScript language notes</title><link href="http://localhost:4000/2013/05/05/javascript-language.html" rel="alternate" type="text/html" title="JavaScript language notes" /><published>2013-05-05T00:00:00+01:00</published><updated>2013-05-05T00:00:00+01:00</updated><id>http://localhost:4000/2013/05/05/javascript-language</id><content type="html" xml:base="http://localhost:4000/2013/05/05/javascript-language.html"><![CDATA[<h2>Types</h2>

<p>JavaScript variables are either Objects (including functions) or one of the primitive types: Boolean, Number, String, <code>null</code> or <code>undefined</code>.</p>

<p>Objects may have prototypes. Prototypes are objects which themselves may have prototypes, forming the finite-length <i>prototype chain</i>. An object with a <code>null</code> prototype ends the prototype chain (<code>Object.prototype</code> has a <code>null</code> prototype).</p>

<p>When a member is accessed via the dot operator, each prototype in the prototype chain is checked in turn, until the named member is found, or the <code>null</code> prototype is reached, in which case undefined is returned. The internal <code>[[prototype]]</code> member, referring to an object's prototype, is not publicly accessible.</p>

<p>Inheritance and shared members are implemented using prototypes in JavaScript. The non-standard, settable <code>__proto__</code> property can be assigned an object to cause that object to inherit from the assigned object.</p>

<p>While a particular method is being executed, the <code>this</code> keyword refers to the object on which the dot operator was applied. This means that in an inherited method it still refers to the subclass. <code>this</code> always refers either to (1) the global object (which is window in a browser) outside a function or inside a function invoked via a variable, (2) the owner of a property access, (3) the value of the first argument passed into <code>Function.prototype.call/apply</code>, (4) the newly created object in a constructor, (5) the calling context's <code>this</code> in <code>eval</code>ed code.</p>

<p><code>Function.prototype.bind</code> takes an object and returns a function which, when invoked, will have a <code>this</code> value equal to the object passed to <code>bind</code>.</p>

<p>To create objects with the same structure but different state, we use constructors.</p>

<p>During JavaScript execution, a stack of execution contexts is created. Specifically, global code gets an execution context, and each invocation has an associated execution context. <code>eval</code>ed code also has a distinct execution context. When a function returns, the current execution context is popped from the stack. When an execution context is created, the following takes place:</p>

<ol>
  <li>A special Activation object is created. This has no prototype, but does have accessible named properties.</li>
  <li>An <code>arguments</code> object is created. This maps integer indices onto the corresponding actual parameters of the function, and has <code>callee</code> and <code>length</code> properties.</li>
  <li>The context is assigned a <i>scope chain</i>. Each function object has an internal property, <code>[[scope]]</code>, containing a list of objects. The scope for the new execution context consists of the scope chain of the function object under execution with the newly-created Activation object prepended.</li>
  <li>The Activation object is also a Variable object. This contains properties for each of the function's formal parameters, assigned the the values of the actual parameters (or undefined when not present). Any inner functions create function objects that are kept in this Variable object. Finally, local variables declared in the function are stored in the variable object in properties according to their names. The value of a local variable is only assigned during execution of the relevant line of the function body (taking into account hoisting), but is initially <code>undefined</code>.</li>
  <li>The <code>this</code> keyword is assigned. If the assigned value is <code>null</code>, property accesses refer to the global object.</li>
</ol>

<p>The global execution context does not have an arguments property, but its variables object is created in the normal way, including 'local' variables and function definitions, which appear as global variables and top-level functions.</p>

<h2>Closures</h2>

<p>Statements in inner function bodies may access local variables, parameters and declared inner functions within their outer functions. When it is made accessible outside the function where it is declared, a closure is formed, and it continues to have access to those variables.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Types]]></summary></entry><entry><title type="html">rjpeg: Exact JPEG recompression</title><link href="http://localhost:4000/2011/03/01/rjpeg.html" rel="alternate" type="text/html" title="rjpeg: Exact JPEG recompression" /><published>2011-03-01T00:00:00+00:00</published><updated>2011-03-01T00:00:00+00:00</updated><id>http://localhost:4000/2011/03/01/rjpeg</id><content type="html" xml:base="http://localhost:4000/2011/03/01/rjpeg.html"><![CDATA[<p>As part of my PhD research, I developed a tool which inverts the computational steps of the Independent JPEG Group's JPEG decompressor version 6b. The tool maps an input image onto the set of bitstreams that produce it on decompression. If the set is empty, it indicates regions that are inconsistent with JPEG decompression.</p>

<p>This page contains information about my JPEG exact recompressor implementation. If you would like to get the source code, please read the instructions below then <a target="_self" href="/cl-web/rjpeg-0.8.tar.gz">click here to download rjpeg-0.8.tar.gz</a>.</p>

<h2>rjpeg: Exact JPEG recompressor (version 0.8)</h2>

<h3>Introduction</h3>

<p>In our paper <a target="_self" href="/cl-web/pdf/spie10-full.pdf">'Exact JPEG recompression' (Andrew B. Lewis and Markus G. Kuhn)</a> we
presented a technique for calculating the JPEG bitstream(s) which produce a
particular uncompressed image given as input.</p>

<p>For full details of the algorithm, see our paper: <a href="http://www.cl.cam.ac.uk/~abl26/spie10-full.pdf">http://www.cl.cam.ac.uk/~abl26/spie10-full.pdf</a></p>

<p>For an overview of the algorithm, a poster is also available: <a href="http://www.cl.cam.ac.uk/~abl26/spie10-poster.pdf">http://www.cl.cam.ac.uk/~abl26/spie10-poster.pdf</a></p>

<p>This archive contains the source code for the recompressor implementation
described and evaluated in the paper.</p>

<p>Please note that this software is experimental and should not be used in
production software. Error checking is missing, some debugging code is included
in the source and the code has not been tested/optimized thoroughly.</p>

<p>See LICENSE for licensing information.</p>

<p>If you find this software useful, I would be grateful to receive an email
describing how you have used it (andrew.lewis at cl.cam.ac.uk). If you would
like to refer to it in an academic publication, please cite our paper:</p>

<pre>@conference{lewis:75430V,
  author = {Andrew B. Lewis and Markus G. Kuhn},
  title = {Exact JPEG recompression},
  publisher = {SPIE},
  year = {2010},
  journal = {Visual Information Processing and Communication},
  volume = {7543},
  number = {1},
  eid = {75430V},
  numpages = {9},
  pages = {75430V},
  location = {San Jose, California, USA},
  url = {http://link.aip.org/link/?PSI/7543/75430V/1},
  doi = {10.1117/12.838878}
}</pre>

<p>Please send any bug reports, queries, suggestions or patches to andrew.lewis at cl.cam.ac.uk.</p>

<h3>Archive contents</h3>

<ul>
<li>README, LICENSE, Makefile: Makefile has targets for the main application (rjpeg) and a version for use with the Condor distributed computing system (www.cs.wisc.edu/condor)</li>
<li>rjpeg.h, rjpeg.c: main(...) function</li>
<li>data.h, data.c: Data types for pixel data, sets, intervals and expression trees</li>
<li>computations.h, computations.c: Rearranges expression trees for chroma smoothing</li>
<li>cspace.h, cspace.c: Inverts the colour space conversion</li>
<li>diagnosticinformation.h, diagnosticinformation.c: Functions to output infeasible block information</li>
<li>fdctislow.h, fdctislow.c: IJG forward DCT (`slow', integer)</li>
<li>forwardoperations.h, forwardoperations.c: Searching and filtering of blocks of quantized coefficient intervals</li>
<li>jpegout.c, jpegout.h: Use libjpeg to output JPEG bitstreams</li>
<li>quantize.c, quantize.h: Calculate possible quantization matrices and apply quantization</li>
<li>reverseidct.c, reverseidct.h: Reverse the decompressor IDCT using libgmp arbitrary precision arithmetic</li>
<li>solver.c, solver.h: Apply the chroma unsmoothing algorithm</li>
<li>unsmooth.c, unsmooth.h: Generate the expression tress for chroma smoothing</li>
</ul>

<h3>Input requirements</h3>

<p>rjpeg takes a file in PPM P6 (binary 24 bits/pixel) format. (Multiple files
can be specified and are processed separately.)</p>

<p>rjpeg will run to completion if the input image was output by a process
equivalent to applying the IJG djpeg algorithm on an JPEG bitstream with the
following characeristics:</p>

<ul>
<li>the image was encoded with chroma sub-sampling (4:2:0);</li>
<li>the stored colour space is YCbCr; and</li>
<li>the image's width and height are both multiples of sixteen (twice the DCT
  block size).</li>
</ul>

<p>Note that these are cjpeg defaults. Also, the decompressor IDCT must be
equivalent to the IJG integer 'slow' transform (the default).</p>

<p>If these conditions are not met, rjpeg will output an error message when it
encouters an inconsistency. I plan to add more helpful diagnostic information
in a later version, to make rjpeg more useful in forensic situations.</p>

<h3>Performance</h3>

<p>rjpeg stores a 128 MB look-up table for colour space conversion on the disk. By
default this is kept in /tmp/ycc_rgb_table but its location can be altered in
cspace.h. This is generated whenever that files does not exist, so the program
will take longer to execute the first time you run it.</p>

<p>Approximate time/space requirements: 512 by 512 images at qualities 90 and below
typically take a few minutes to recompress on my machine, proportional to the
quality factor and number of saturated pixels. The maximum memory usage was
around 500 MB.</p>

<p>I have not yet tried to optimize speed and memory usage, and there are many
opportunities to do so.</p>

<h3>How to use</h3>

<p>You will need these headers and libraries:</p>

<ul>
<li>The IJG library, used to create output bitstreams (libjpeg).</li>
<li>The GMP arbitrary precision arithmetic library (libgmp).</li>
</ul>

<p>You may wish to update the following constants:

<p>The filename used to store the colour space conversion table, in cspace.h: #define INVERSE_YCC_RGB_TABLE_FILE_NAME "/tmp/ycc_rgb_table"</p>

<p>The default will produce <code>-rw-rw-r-- 128M /tmp/ycc_rgb_table</code> (An alternative string constant is used in the Condor target.)</p>

<p>The number of quantized DCT coefficient block candidates beyond which the exhaustive search step is considered infeasible, in forwardoperations.h<code>static const UINT64 possibilities_limit = (1L &lt;&lt; 20);</code></p>

<p>The upper and lower limits for results of the IDCT, used when inverting the range clipping operation, in reverseidct.h <code>#define RANGE_LIMITING_UPPER_LIMIT 288, #define RANGE_LIMITING_LOWER_LIMIT -35</code>. Making these values further from 0 or 255 will recompress more images with black or white areas correctly at the expense of increased search sizes.</p>

<p>Run 'make' to produce the executable.</p>

<p>I have included a compressed test image from the UCID <a href="http://www-staff.lboro.ac.uk/~cogs/datasets/UCID/ucid.html">http://www-staff.lboro.ac.uk/~cogs/datasets/UCID/ucid.html</a>.</p>

<p>Example usage:</p>

<pre>$ tar -zxvf rjpeg-0.8.tar.gz
$ make
$ djpeg -outfile example1.ppm example1.jpg
$ ./rjpeg example1.ppm
QF: 40
Y: 3072 exact 0 ambiguous 0 infeasible 0 impossible
Cb: 768 exact 0 ambiguous 0 infeasible 0 impossible
Cr: 768 exact 0 ambiguous 0 infeasible 0 impossible
$ diff example1.ppm.result.jpg example1.jpg</pre>

<p>The last command should output nothing, indicating that the files are binary
identical.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[As part of my PhD research, I developed a tool which inverts the computational steps of the Independent JPEG Group's JPEG decompressor version 6b. The tool maps an input image onto the set of bitstreams that produce it on decompression. If the set is empty, it indicates regions that are inconsistent with JPEG decompression.]]></summary></entry></feed>