Abstract: Many digital images need to be redacted before they can be disseminated. A common way to remove the sensitive information replaces the pixels in the sensitive region with black or white values. Our goal is to study the effectiveness of this simple method in purging information. Since digital images are usually lossily compressed via quantization in the frequency domain, each pixel in the spatial domain will be “spread” to its surroundings, similar to the Gibbs-effect, before it is redacted. Hence, information of the original pixels might not be completely purged by replacing pixels in the compressed image. Although such residual information is insufficient to reconstruct the original, it can be exploited when the content has low entropy. We consider a scenario where the goal of the adversary is to identify the original among a few templates. We give two approaches and investigate their effectiveness when the image is compressed using JPEG or wavelet-based compression scheme. We found that, if a redacted image is compressed in higher bit rate compared to the compression of the original image, then the correct template can be identified with noticeable certainty. Although the requirements are stringent, it will not be surprising that redacted images matching the requirements can be found in the public domain. Hence, our findings highlight a subtle attack that must be considered when declassifying images.

@article{ho2008rir,
  publisher    = {Springer},
  author       = {Nicholas Zhong-Yang Ho and Ee-Chien Chang},
  url          = {http://www.comp.nus.edu.sg/~changec/publications/2008_IH_Residual_Information_Redacted_Image.pdf},
  journal      = {Lecture Notes In Computer Science},
  pages        = {87--101},
  year         = {2008},
  title        = {Residual information of redacted images hidden in the compression artifacts},
}