Source code: https://github.com/BudgieKitten/steganography
Introduction #
Steganography is the study of concealing (hiding) messages or data in an image, without it being detected. Not to confuse with watermarking which usually carries supplemental information and is often advertised to deter illegal activity.
Steganography comes in a variety of forms. It was an effective method to communicate during war, without being suspected or discovered. Example: https://www.atlasobscura.com/articles/knitting-spies-wwi-wwii. Or, hiding information (or secrets) in a song. Example: https://cacm.acm.org/news/hiding-data-in-music/
It is often difficult to distinguish whether an image contains the secret messages without the original image. This project will implement the simplest and most common steganography algorithm called Least Significant Bit (LSB).
Some primers #
Image formats #
- .bmp: image is not compressed
- .jpg and .jpeg: compressed image but may lose some information
- .png: lossless and compressed image
Image size in order of decreasing:
.bmp > .png > .jpg
Image organization #
Generally stored in raster formats, where the data (RGBA) is stored row by row (like a matrix).
Explanation #
The project uses LSB embedding algorithm which is the most common and simplest algorithm. However, it is vulnerable to histogram attack. Even though it is very insecure, it is usually undetectable.
1 word (encoded in UTF-8) is 8-bit and the RGBA color range has a total of 4 colors (8-bit for each color Red, Green, Blue, Alpha). Therefore, each color can be used to store at least \(8\verb|-bit word| ÷ 4 \verb| color| = 2 \verb| bits|\) of the word. In total, the last 2 bits of each color will be used to store the word.
In order to avoid integer overflow from adding or subtracting bits into the color value (\(x>255\) or \(x<0\)), let \(x\) to be the number of bits that will be embedded, \(|x+4|<=131\) will be chosen for colors that stays in the range \(0−127\) after getting embedded and \(|x−4|>123\) for colors in the range \(128−255\).
Limitations: \(rows \times columns\) pixels (of the image) must be larger than the total characters. Otherwise, it can overwrite the previously embedded message (by the pigeonhole theorem).
The math #
During embedding: #
New color bit of each color value == color value (8-bit) + embedded bit (2-bit)
\(\Longrightarrow\) Embedded bit == Bit of a word - (color value mod 4)
During extracting: #
(Each color value (of embedded image) \(\bmod\) 4) = Original bit of the secret text
Notes: #
- During embedding, if the messages reaches End of File (-1), the null bit (0,0,0,0) will be embedded to mark EOF.
- Since the project is currently only supporting UTF-8 (for now), any foreign characters (larger than 8-bit or \(127_{10}\) or essentially larger than number 127 in decimal), the character will be substituted as ‘?’.
References #
https://www.amazon.ca/Steganography-Digital-Media-Principles-Applications/dp/0521190193