Abstract: Despite the rapid adoption of Voice over IP (VoIP), its security implications are not yet fully understood. Since VoIP calls may traverse untrusted networks, packets should be encrypted to ensure confidentiality. However, we show that when the audio is encoded using variable bit rate codecs, the lengths of encrypted VoIP packets can be used to identify the phrases spoken within a call. Our results indicate that a passive observer can identify phrases from a standard speech corpus within encrypted calls with an average accuracy of 50%, and with accuracy greater than 90% for some phrases. Clearly, such an attack calls into question the efficacy of current VoIP encryption standards. In addition, we examine the impact of various features of the underlying audio on our performance and discuss methods for mitigation.

@article{wright2008smi,
  author       = {Charles V. Wright and Lucas Ballard and Scott E. Coull and Fabian Monrose and Gerald M. Masson},
  url          = {http://www.cs.jhu.edu/~cwright/oakland08.pdf},
  journal      = {Security and Privacy, 2008. SP 2008. IEEE Symposium on},
  year         = {2008},
  title        = {Spot me if you can: uncovering spoken phrases in encrypted {VoIP} conversations},
  pages        = {35--49},
}