Linux hard disk encryption settings
This page intends to educate the reader about the existing weaknesses
of the public-IV on-disk format commonly used with cryptoloop and
dm-crypt (used in IV-plain mode). This page aims to facilitate risk
calculation when utilising Linux hard disk encryption. The attacks
presented on this page may pose a thread to you, but at the same time
may be totally irrelevant for others. At the end of this document, the
reader should be able to make a good choice according to his security
needs.
A good quote with respect to this topic is ''All security involves
trade-offs'' from Beyond Fear (Bruce Schneier). You should keep in mind
that perfect security is unachievable and by all means shouldn't be
your goal. For instance, when using pass phrase based cryptography, you
have to trust in that the underlying system is secure, the computer
system has not been tampered with, and nobody is watching you. The most
obvious weakness is the last one, but even if you make sure nobody nor
any camera is around, how about the keyboard you're typing on? Has it
been manipulated while you have been getting your lunch?
So security comes for a price, and the price when designing
cryptography security algorithms is performance. You will be introduced
to the fastest of all setups available, the "public-IV", which
sacrifices security properties for speed. After that we will talk about
ESSIV, the newest of IV modes implemented. It comes for a small price,
but it can deal with watermarking for a relatively small price. Then
you'll be introduced to the draft specifications of the Security in
Storage Working Group ([18]SISWG). Currently SISWG is considering EME
and LRW for standardisation. EME along with it's cousin CMC seems to
provide the best security level, but imposes additional encryption
steps. Plumb-IV is discussed only for reference, because it has the
same performance penalty as CMC, but in constrast suffers from
weaknesses of CBC encryption.
As convention, this document will use the term "blocks", when it
referes to a single block of plain or cipher text (usually 16 byte),
and will use the term "sectors", when it refers to a 512-byte wide hard
disk block.
CBC Mode: The basic level
Most hard disk encryption systems utilise CBC to encrypt bulk data.
Good descriptions on CBC and other common cipher modes are available at
* [19]Wikipedia
* [20]Connected: An Internet Encyclopedia
* [21]NIST: Recommendation for Block Cipher Modes of Operation (CBC
is at PDF Page 17)
Please make sure you're familiar with CBC before proceeding.
Since CBC encryption is a recursive algorithm, the encryption of the
n-th block requires the encryption of all preceding blocks, 0 till n-1.
Thus, if we would run the whole hard disk encryption in CBC mode, one
would have to re-encrypt the whole hard disk, if the first computation
step changed, this is, when the first plain text block changed. Of
course, this is an undesired property, therefore the CBC chaining is
cut every sector and restarted with a new initialisation vector (IV),
so we can encrypt sectors individually. The choice of the sector as
smallest unit matches with the smallest unit of hard disks, where a
sector is also atomic in terms of access.
For reference, I will give a formal definition of CBC encryption and
decryption. Note, that decryption is not recursive, in contrast to
encryption, since it's a function only of C[n-1] and C[n].
Encryption:
C[-1] = IV
C[n] = E(P[n] ⊕ C[n-1])
Decryption:
C[-1] = IV
P[n] = C[n-1] ⊕ D(C[n])
The next sections will deal with how this IV is chosen.
The IV Modes
The "public-IV"
The IV for sector n is simply the 32-bit version of the number n
encoded in little-endian padded with zeros to the block-size of the
cipher used, if necessary. This is the most simple IV mode, but at the
same the most vulnerable.
ESSIV
E(Sector|Salt) IV, short ESSIV, derives the IV from key material via
encryption of the sector number with a hashed version of the key
material, the salt. ESSIV does not specify a particular hash algorithm,
but the digest size of the hash must be an accepted key size for the
block cipher in use. As the IV depends on a none public piece of
information, the key, the sequence of IV is not known, and the attacks
based on this can't be launched.
plumb IV
The IV is computed by hashing (or MAC-ing) the plain text from the
second block till the last. Additionally, the sector number and the key
are used as input as well. If a byte changes in the plain text of the
blocks 2 till n, the first block is influenced by the change of the IV.
As the first encryption effects all subsequent encryption steps due to
the nature of CBC, the whole sector is changed.
Decryption is possible because CBC is not recursive for decryption. The
prerequisites for a successful CBC decryption are two subsequent cipher
blocks. The former one is decrypted and the first one is XOR-ed into
the decryption result yielding the original plain text. Therefore
independent of the IV scheme, decryption is possible from the 2nd to
the last block. After the recovery of these plain text blocks, the IV
can be computed, and finally the first block can be decrypted as well.
The only weakness of this scheme is it's performance. It has to process
data twice: first for obtaining the IV, and then to produce the CBC
encryption with this IV. With the same performance penalty CMC is able
to achieve better security properties (CMC is discussed later), thus
plumb-IV will remain unimplemented.
The attack arsenal
Content leaks
This attack can be mounted against any system operating in CBC Mode. It
rests on the property, that in CBC decryption, the preceding cipher
block's influence is simple, that is, it's XORed into the plain text.
The preceding cipher block, C[n-1], is readily available on disk (for n
> 0) or may be deduced from the IV (for n = 0). If an attacker finds
two blocks with identical cipher text, he knows that both cipher texts
have been formed according to:
C[m] = E(P[m] ⊕ C[m-1] )
C[n] = E(P[n] ⊕ C[n-1] )
Since he found that C[m] = C[n], it holds
P[m] ⊕ C[m-1] = P[n] ⊕ C[n-1]
which can be rewritten as
C[m-1] ⊕ C[n-1] = P[n] ⊕ P[m]
The left hand side is known to the attacker by reading the preceding
cipher text from disk. If one of the blocks is the first block of a
sector, the IV must be examined instead (when it's available as it is
in public-IV). The attacker is now able to deduce the difference
between the plain texts by examining the difference of C[m-1] and
C[n-1]. If one of the plain text blocks happens to be zero, the
difference yields the original content of the other related plain text
block.
Another information is available to the attacker. Any succeeding
identical pair of cipher text, that follows the initial identical
cipher pair, is equal. No information about the content of those pairs
can be extracted, since the information is extracted from the
respective preceding cipher blocks, but those are all required to be
equal.
Let's have a look at the chance of succeeding with this attack.
Assuming the output of a cipher forms an uniform distribution, the
chance, p, of finding an identical block is 2^-blocksize. For instance,
p = 1/2^128 for a 128-bit cipher. Because the number of possible pairs
develops as an arithmetic series in n, the number of sectors, the
chance of not finding two identical blocks is given by
(1-p)^n(n-1)/2
As p is very small, but in contrast the power is very big, we apply the
logarithm to get meaningful answers, that is
n(n-1)/2 ln (1-p)
An example: The number of cipher blocks available on 200GB disk with
known C[n-1] is 200GB × 1024^2 KB/GB × 64/1KB ^1. Or in other words, a
128-bit block is 16 bytes, so the number of 16-byte blocks in a 200GB
hard disk is 13.4 billion. Therefore, n = 1.342e10. For a 128-bit
cipher, p = 2^-128. Hence,
ln(1-p) = -2.939e-39
n(n-1)/2 = 9.007e19
n(n-1)/2 ln (1-p) = -2.647e-19
1-e^-2.776e-13 = 2.647e-19
The last term is the chance of finding at least one pair of identical
cipher blocks. But how does this number grow in n? Obviously
exponentially. Plotting a few a decimal powers shows that the chance
for finding at least on identical cipher pair flips to 1 around n =
10^20 (n = 10^40 for a 256-bit cipher). This inflexion point is reached
for a 146 million TB storage (or a hundered thousand trillion trillions
TB storage for a 256-bit cipher).
^1The blocks with available preceding cipher blocks is 62/1KB for all
non-public IV schemes, i.e. ESSIV/plumb IV
Data existence leak: The Watermark
No IV format discussed on this page allows the user to deny the
existence of encrypted data. Neither cryptoloop nor dm-crypt is an
implementation of a deniable cryptography system. But the problem is
more serious with public-IV.
With public IV and the predicable difference it introduces in the first
blocks of a sequence of plain text, data can be watermarked, which
means, the watermarked data is detectable even when the key has not
been recovered. As shown in the paragraph above, the existence of two
blocks with identical cipher text is very unlikely and coincidence can
be excluded, which is relevant when somebody tries to demonstrate
before the law that certain data is in an encrypted partition.
As the IV progresses with a foreseeable pattern and is guaranteed to
change the least significant bit ever step, we can build identical pair
of cipher text by writing three consecutive sectors each with a flipped
LSB relative to the previous. (The reason it's three instead of two is,
that the second least significant bit might change as well.) This
"public-IV"-driven CBC encryption will output exactly the same cipher
text for two consecutive sectors. An attacker can search the disk for
identical consecutive blocks to find the watermark. This can be done in
a single pass, and is much more feasible than finding to identical
blocks, that are scattered on the disk, as in the previous attack. A
few bits of information can be encoded into the watermarks, which might
serve as tag to prove the existence copyright infringing material.
A complete description of watermarking can be found in [22]Encrypted
Watermarks and Linux Laptop Security. The attack can be defeated by
using ESSIV.
Data modification leak
CBC encryption is recursive, so the n-th block depends on all previous
blocks. But the other way round would also be nice. Why? The weakness
becomes visible, if storage on a remote computer is used, or more
likely, the hard disk exhibits good forensic properties. The point is,
the attacker has to have access to preceding (in time) cipher text of a
sector, either by recording it from the network, or by using forensic
methods.
An attacker can now guess data modification patterns by examining the
historic data. If a sector is overwritten with a partial changed plain
text, there is an amount of bytes at the beginning, which are
unchanged. This point of change^2 is directly reflected in the cipher
text. So an attacker can deduce the point of the change in plain text
by finding the point where the cipher text starts to differ.
This weakness is present in public-IV and ESSIV.
^2aligned to the cipher block size boundaries
Malleable plain text
The decryption structure of CBC is the source of this weakness.
Malleability (with respect to cryptography) is defined as a
modification of the cipher text that will resulting in a predictable
change in plain text. To put it formally, there is a function f(C),
that, if applied to the cipher text, C' = f(C), will result in a known
function f', which will predict the resulting plain text, P' = D(C'),
correctly assuming P is known, that is P' = f'(P).
As we can see in it's definition, CBC decryption depends on C[n-1]. An
attacker can flip arbitrary bits in the plain text by flipping bit in
C[n-1]. More formally^3, if
P = P[1] || P[2] || ... || P[i] || ... || P[n]
C = E[CBC](P)
C = C[1] || C[2] || ... || C[i-1] || ... || C[n]
the function
f(C[1] || ... || C[n]) = C[1] || ... || C[i-1] XOR M || ... || C[n]
follows the function f', which predicts the resulting plain text
correctly as,
f'(P[1] || ... || P[n]) = P[1] || ... || P[i] XOR M || ... || P[n]
The first block of the CBC cipher text stream is not malleable, because
it depends on the IV, which is not modifiable for an attacker.
^3The IV parameter for E[CBC] has been intentionally omitted.
Movable
On the expense of one block decrypting to garbage, an attacker can move
around plain text as he likes. CBC decryption depends on two variables,
C[n-1] and C[n]. Both can be modified at free will. To make meaningful
modifications, an attacker has to replace the pair C[n-1] and C[n] with
other cipher text pair from disk. The first block C[n-1] will decrypt
to garbage, but the second block C[n] will yield a copy of the plain
text of the copied cipher block. This attack is also known as
copy&paste attack. This attack is mountable against any CBC setup. The
only limitation is, the first block, C[0], can't be replaced with
something meaningful, as C[-1] can't be modified, because it's the IV.
CMC and EME: Tweakable wide block cipher modes
CMC is a new chaining mode. It stands for ''CBC-Mask-CBC''. It works by
processing the data in three steps, first CBC, then masking the cipher
text, and then another CBC step, but this time backwards. The last step
introduces a dependency from the last block to the first block. The
authors of the CMC paper provide a prove for the security of this mode,
making a secure 128-bit cipher a secure 4096-bit cipher (sector size).
As in normal CBC, this scheme also takes an IV, but the authors call it
tweak.
EME is CMC's cousin. EME has also been authored by Haveli and Rogaway
as well been authored for the same purpose. The difference to CMC is,
that EME is parallelizable, that is, all operations of the underlying
cipher can be evaluated in parallel. To introduce an interdependency
among the resulting cipher blocks, the encryption happens in two
stages. Between these stages a mask is computed from all intermediate
blocks and applied to each intermediate block. This step causes an
interdependency among the cipher blocks. After applying the mask,
another encryption step diffuses the mask.
The interdependency among the resulting blocks allow CMC and EME to be
nonmovable, nonmalleable, to prevent content leaks and in-sector data
modification patterns. The tweaks are encrypted by both cipher modes,
thus both are nonwatermarkable.
For simplicity, the EME description above omitted the pre- and post-
whitening steps as well as the multiplications in GF(2^128). An
in-depth specification can be found at the [23]Cryptology ePrint
Archive. An applicable draft specification for EME-32-AES can be found
at [24]SISWG. I have written an EME-32-AES test implementation for
Linux 2.6. It's available [25]here. The CMC paper is available from the
[26]Cryptology ePrint Archive as well.
LRW: A tweakable narrow block cipher mode
EME as well as CMC are comparatively secure cipher modes, but heavy in
terms of performance. LRW tries to cope with most of security
requirements, and at the same time provide a good performance. LRW is a
narrow block cipher mode, that is, it operates only on one block,
instead of a whole sector. To make a cipher block tied to a location on
disk (to make it unmovable), a logical index is included in the
computation. For LRW you have to provide two keys, one for the cipher
and one for the cipher mode. The second key is multiplied with a
logical index under GF(2^128) and used as pre- and post- whitening for
encryption. With those whitening steps the block is effectively tied to
a logical index. The logical index is usually the absolute position on
disk measured with the block size of the cipher algorithm. The
different choice of the measuring unit is the only different between
the logical index and the public-IV.
The LRW draft is available from the [27]SISWG mailing list archive.
Summarising
The following table shows a comparison between the security properties
of different encryption setups and their computational costs. The
number of cipher calls, XOR operations and additional operations are
stated in terms of encryption blocks, n.
IV mode cipher mode content leaks watermarkable malleable movable
modification detection^5 cipher calls XOR ops additional op.
public-IV CBC Yes Yes Yes Yes Yes n n None
ESSIV CBC Yes No Yes Yes Yes n+1 n None
Plumb-IV1^4 CBC Yes No Yes Yes No 2n-1 2n None
public-IV CMC No No No No No 2n+1 2n+1 1 LW GF ⊗
public-IV EME No No No No No 2n+1 5n 3n-1 LW GF ⊗
public-IV LRW No No No No Yes n 2n n HW GF ⊗
Legend:
* LW GF ⊗: light-weight Galois field multiplication, that is, a
multiplication with a constant x^2^i, which can be computed in
θ(1).
* HW GF ⊗: heavy-weight Galois field multiplication, that is, a
multiplication with an arbitrary constant, which can be computed in
θ(bits).
^4plumb-IV1 uses CBC-MAC instead of hashing, so we can make a good
comparison with other ciphers in terms of cipher/XOR calls.
^5detectable partial in-sector modification
__________________________________________________________________
Clemens Fruhwirth, , also author of LUKS and ESSIV, porter of
cryptoloop, aes-i586 for 2.6., twofish-i586, and implementor of
EME-32-AES. This text is an excerpt of my diploma thesis.
This page has been reviewed by
Dr. Ernst Molitor
Arno Wagner
James Hughes , "Security in Storage Working Group" chair
Additional thanks to Pascal Brisset, for pointing out an error in the
Bernoulli estimation in an earlier version of this document, further
Adam J. Richter for pointing out an error in the KB/GB ratio.
Content and design, Copyright © 2004-2008 Clemens Fruhwirth, unless
stated otherwise
Original design by [28]haran | Additional art by [29]LinuxArt | | Blog
by [30]NanoBlogger
References
1. http://clemens.endorphin.org/
2. http://clemens.endorphin.org/credits
3. http://clemens.endorphin.org/aboutme
4. http://clemens.endorphin.org/cryptography
5. http://blog.clemens.endorphin.org/
6. http://clemens.endorphin.org/patches
7. http://clemens.endorphin.org/archive
8. http://clemens.endorphin.org/Cryptoloop_Migration_Guide
9. http://clemens.endorphin.org/LUKS
10. http://clemens.endorphin.org/AFsplitter
11. http://clemens.endorphin.org/lo-tracker
12. http://blog.clemens.endorphin.org/2008/12/luks-on-disk-format-revision-111.html
13. http://blog.clemens.endorphin.org/2008/11/xmonad-gridselect.html
14. http://blog.clemens.endorphin.org/2008/11/workaround-for-bittorrent-traffic.html
15. http://blog.clemens.endorphin.org/2008/09/i-love-lolcat-meme.html
16. http://blog.clemens.endorphin.org/2008/09/counter-steganography-research.html
17. http://clemens.endorphin.org/cryptography
18. http://www.siswg.org/
19. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation
20. http://www.freesoft.org/CIE/Topics/143.htm
21. http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf
22. http://www.tcs.hut.fi/~mjos/doc/wisa2004.pdf
23. http://eprint.iacr.org/2003/147/
24. http://grouper.ieee.org/groups/1619/email/pdf00011.pdf
25. http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/544
26. http://eprint.iacr.org/2003/148/
27. http://grouper.ieee.org/groups/1619/email/msg00160.html
28. http://www.oswd.org/user/profile/id/3013
29. http://www.linuxart.com/
30. http://nanoblogger.sourceforge.net/