RNG

Random Number Generator of an accumulative type

In this article I am going to tell about the operating principles and
characteristic properties of Random Number Generator (RNG) used in the
DiskCryptor.

RNG - this is a critically important security element of a disk
encryption system, as it generates the keys with which data is
encrypted. And it is exactly the RNG that is often a weak spot or a
backdoor in a different sorts of proprietary and government certified
encryption systems, because precision of an encryption algorithm's
implementation and the data storage methods can be easily examined even
without having an access to the source code, but RNG, however, is much
more difficult to analyze.

It is possible to create a one-way strong RNG that will allow for a
person, knowing a certain secret, to recover all the keys that were
generated by it and to consequently compromise encrypted data. A good
example of such case - the
Dual_EC_DRBG standard,
which has backdoor and is used in Windows, since Vista SP1.

The creation purposes and the objectives performed

The DiskCryptor's RNG purpose is to perform the following two
objectives: generation of encryption keys and generation of a large
volume of random data for disk wiping algorithms. For the first
objective it is necessary to provide as big strength reserve as
possible, and below are the key generation requirements that RNG has to
meet:

  1. Issue of numbers, statistically indistinguishable from random ones.
  2. Impossibility of an algorithmic prediction of the sequences being
    generated or the ones generated earlier even with the availability
    of a large amount of random numbers made in the past.
  3. The requirement No.2 has to retain its maximum possible integrity
    state even when an attacker knows an internal state of the
    generator.
  4. It is imperative to provide a maximum possible strength reserve that
    exceeds attacker's capabilities by manifold.
  5. Simple and clear, even for a non-professional, architecture and
    operating principles.

For the second objective, when generating a large volume of random
numbers, requirements for RNG differ:

  1. Issue of numbers, statistically indistinguishable from random ones.
  2. Impossibility to predict sequences being generated without knowing
    an internal state of the generator.
  3. Maximum possible performance.

RNG for key generation

RNG for key generation that is used in the DiskCryptor, belongs to the
accumulative RNG type with multiple sources of entropy. RNG of this type
accumulates minor entropy values over a long period of time, following
that it is capable of issuing a truly random sequence having size of up
to 640 bytes, afterwards, pseudo-random sequences will follow up until a
new accumulation of entropy.

As a basis for the RNG architecture,
SHA-512 hash function and
AES-256
block cipher are taken. The choice of this function is made because of
its simplicity, good performance and high reliability, though less
stringent requirements are set for hash in this case than, for example,
when using it for the digital
signatures
. The hash
function used, has to have the following properties:

  1. Statistically random output on casually-random input data.
  2. Dependence of each output bit of hash on all the input bits. On a
    change of one bit on an entry, no less than a half of output bits
    has to alter.
  3. One-sidedness. Recovery of an entry data based on its hash value has
    to be impossible. This characteristic is very desirable, but a part
    of RNG properties are retained even without its presence.

Simplified block scheme of RNG looks the following way:

Simplified block scheme of DiskCryptor's RNG

Accumulation of random numbers

This is being done by hashing a seed together with a timestamp and a
reseed counter. Adding both timestamp and reseed counter prevents
generation of identical hash values on a same seed. The obtained hash is
added to the random number pool by means of addition with respect to
base 28 with existing content of the pool, and that creates a dependence
of content of the pool on all random number additions made earlier.

The pool has the size of 640 bytes, which allows for the accumulation of
the maximum of 5120 bits of entropy. On one addition of a seed, the size
of entropy increases by the maximum of 512 bits, therefore the SHA-512
hash size requires no less then 10 reseeds, for fully filling up the
pool. In addition to having the big pool for data accumulation, RNG also
has the second one, the keys pool. This pool has the size of 256 bits
and it is never being outputted, even indirectly, but is used as a key
for the AES-256, with which output data is encrypted.

Mixing up the pool

Mixing up the pool - this is the operation that is performed with the
objective to make all the bits in the pool to become dependent on each
other. The goal of mixing - is to create the reliance of each bit being
generated, on all the content of the pool, but at the same time not to
decrease the maximum entropy, as would have happened on hashing the
whole pool at the moment of issuing random numbers. The mixing operation
has a characteristic of being one-sided, meaning that, knowing a state
of the pool after a mixing, it is impossible to restore its previous
state. This characteristic is directly based on the corresponding
properties of the SHA-512 hash.

Mixing is being done with the help of mix function (the rnd_pool_mix
function in the source code). During the mixing process, the pool is
being divided into 10 parts, each having the size of 64 bytes, and to
each part, a value of SHA-512 hash of all the pool content is being
added. This process can be presented as follows:

pool0 = pool0 + SHA(pool0 || pool1 ... pooln)
pool1 = pool1 + SHA(pool0 + SHA(pool0 || pool1 ... pooln) || pool1 ... pooln)
...

Mixing is performed in either of the following three cases: before
generation of a random byte sequence, after it, and in the case when
entire content of the pool has been expended in the process of
generation, and issuing of data has gone to a second round. The first
mixing is required for the creation of dependency of output data on all
the pool content. The second mixing protects the generated keys from an
attacker who has an opportunity to read the contents of memory after
generation (for example with the Cold
Boot
attack). The third
mixing allows for the use of new entropy that is being added to the pool
in the process of random number generation, and it also creates an
additional remoteness between the already generated and the subsequent
random numbers.

Random numbers retrieval

The random numbers retrieval function receives generator's output data
from its internal state. Retrieval of random numbers is done by taking
the SHA-512 hash from a part of the pool, random generated blocks
counter (getrand counter), and a size of a remaining part of a requested
data (remaining length), and encryption of an obtained hash with AES-256
key, received from the keys pool. Getrand counter and remaining length
are needed to eliminate the possibility of generation of two identical
output blocks on a same content of the pool. A situation having the same
content in the pool is an unlikely one, due to its mixing, nevertheless,
I believe that an additional safety measure would only be of a benefit.
The generation process can be presented as follows:

rand0 = SHA(pool0 || counter || length)
rand1 = SHA(pool1 || counter + 1 || length - 64)
...
randn = SHA(pooln || counter + n || length - 64*n)

In the process of retrieval of every 512 bits of data, twice, a
gathering of additional entropy takes place. After reaching the maximum
possible n (having the 640 bytes pool size, it equals 10), pool mixing
takes place, and the subsequently issued data will not be truly random,
but partially pseudorandom. The strength of this construction to
withstand a risk of being compromised, having minor values of an
effective entropy (that is, pseudorandom numbers), is based on the total
strength of SHA-512 and AES-256, that gives a strength reserve even in a
case of fully breaking of one of these cryptoprimitives. The source code
for number retrieval from the pool, you can find in the function
rnd_get_bytes.

Sources of entropy

Sources of entropy, - is the RNG element, the importance of which is
hard to overvalue. It is precisely these sources that are responsible
for an effective entropy, which is accumulated by RNG, and consequently,
an effective entropy of generated keys. Howsoever good the entropy
retrieval algorithm might be, but a poor source of entropy can place the
RNG security in jeopardy. And vice versa, - a good source of randomness,
in some cases, may "stretch" the security of a weak retrieval algorithm.
Sources of entropy should be approached to with a much greater paranoia,
than the RNG algorithm, as the strength of these sources is practically
non-provable and can vary by many magnitudes on different assumptions
about an attacker's resources and potential. So therefore, it is prudent
to follow the rule of "You can't spoil a good thing. Plenty is no
plague." And thus, it is best to use as many different sources of
randomness, as possible. Adding another additional source, will not
lower the strength, but only might to increase it, in certain cases.
Consequently, what is needed here, is - redundancy, more redundancy, and
even more redundancy.

Having redundancy, - is a very good thing, but one should not rely on an
empirical value of its size, as it just may happen, that an attacker may
be able to read the majority of entropy sources used, and the remaining
inaccessible sources he may be capable to pick up or predict. Therefore,
there is the need to have at least a simplest numerical basis for an
effective size of entropy in relation to an attacker, who has various
capabilities. For the start, we should define what kind of leverage a
hypothetical attacker may have.

  • The simplest version, - is when an attacker has absolutely no
    access to a computer in a moment of random number generation
    , he
    also does not know the exact moment of generation, nor has an
    information about the state of a system on that moment either. This
    model corresponds to the most likely kind of an attacker, and this
    model has the maximum possible safety of all the sources of entropy.
    Later on, one of the versions of strength calculation will be given
    with regard to this model.
  • A more complicated version, - is when an attacker has a partial
    access to a computer and knows an approximate (within the
    milliseconds range) time of random number generation
    , and when he
    also has the access to a part of system information on that moment.
    Practically, this model corresponds to a potential that an attacker
    may have in a multi-user environment, when he has no administrative
    privileges in there. Later on, strength calculation for this model
    will also be presented, and exactly these kind of numbers should be
    taken as a basis for giving an assessment for having a minimum
    acceptable strength.
  • And finally the third version, - when an attacker has a full access
    to a system
    . It is obvious, that it is impossible to be protected
    in such a case, as an attacker can take control of and to replace
    any sources of entropy, and also can have a direct influence upon
    internal state of the RNG, or to replace the RNG completely with his
    own one. This version corresponds to a kind of an attacker, which
    has administrative privileges on a system that he is attacking, and
    there is no, nor there can be a defense from him. Therefore, under
    no circumstances you must generate keys or work with a confidential
    data on a system that can be controlled by someone other than you.
    Otherwise, your security is literally hairbreadth to none.

The sources of entropy that are used in the DiskCryptor's RNG are
separated into two types: fast sources, working in the kernel-mode, and
slow sources, working in the user-mode. The first ones a residing
together with the RNG inside the driver, and the second ones reside in
the dcapi.dll, and the data from them is periodically transmitted to
the driver. First, let us examine a set of functions of the kernel, used
as a kernel-mode sources of randomness. Their main property is
inaccessibility of their direct reading by an attacker that has a
limited access to a system, but a portion of their bits may be
predictable, and so in the table that is presented below there are two
values for each source: minimum and maximum. The values are true on a
condition that the use of these sources is no frequent than 10 times per
second, on a more numerous use an effective entropy of each call
decreases.

Function Dependency Minimum entropy Maximum entropy
PsGetCurrentProcess On the process invoking reseed 0 2
PsGetCurrentProcessId On the process invoking reseed 0 2
KeGetCurrentThread On the thread invoking reseed 1 4
PsGetCurrentThreadId On the thread invoking reseed 0 2
KeGetCurrentProcessorNumber On the number of current processor 0 2
KeQueryInterruptTime On all interrupts in the system 8 16
KeQueryPriorityThread On the thread priority 0 1
KeQueryPerformanceCounter RTC (Real Time Clock) 4 8
__rdtsc CPU timestamp counter 16 24
ExGetPreviousMode Previous mode of the thread execution 0 1
IoGetInitialStack Initial stack of the thread 0 1
IoGetTopLevelIrp Last IRP (Input/output Request Packet) in the queue 0 2
MmQuerySystemSize Size of the system memory 0 ?
ExUuidCreate PRNG of the Windows kernel 8 (?) ?
RtlRandom Predictable value, but may be altered by other drivers 0 ?
KeQueryTickCount Number of ticks of the counter from the moment of system's boot 2 4
IoGetStackLimits Stack limits for the current thread 0 ?
Total: 39 75

Thus, we can see that on an average we get 39-75 random bits on one
reseed operation. This is not enough if reseed is carried out once,
therefore DiskCryptor retrieves entropy from fast functions in the
moment of a first thousand disk I/O operations after the system's start.
Even if we are to assume that the time of I/O operations execution is
practically constant, then on the minimum we would have no less than 1
bit of entropy per reseed (because predicting the time of reading data
from a disk with the nanosecond precision looks like a fiction), that in
total would amount to no less than 1000 bits of effective entropy right
after the system's boot. Subsequently, the fast source of entropy is
being used less, and namely in the moment of some not very frequent
system events, and in the moment of random number generation.

Besides using fast kernel-mode entropy sources, RNG also uses slow
user-mode sources. Several API functions and also user actions (mouse
movement, pressing of keys) are used as user-mode sources. At the start
of the program, Windows's RNG CryptoAPI CryptGenRandom function is used
once, as the source of entropy. Recent studies show that this RNG is
weak and in a protracted period of time it issues pseudorandom numbers,
as it rarely replenishes entropy, and thus using it more than once is
not justified. It is difficult for me to name the value of its effective
entropy, but apparently it has to be no less than 32 bits, otherwise RNG
would have had a very short period, which can be discovered with a
simplest of tests. Mouse movements are used as the source of entropy the
whole time the program is working. This is a very good source of entropy
even on its own accord, as the time between the two mouse communications
is wide in comparison with CPU clocks, and even from this time alone it
is possible to confidently gather 16 random bits at a time. Extra 2-3
random bits can be taken off from mouse cursor coordinates and presses
of mouse keys. Keyboard keys presses in the program's window are also
used by RNG, and both the time of press and release, as well as the
key's code are sent to it. This guarantees, that even in the absence of
other sources of randomness, the effective entropy of the keys being
generated, cannot be less than the entropy of your password, as the
password is also used in key generation. It is also important, that
these sources of entropy are inaccessible for reading by a local
attacker with limited privileges, since if attacker's privileges are
high enough to intercept keyboard presses, then encryption loses all its
meaning.

In addition to one-time and persistent sources of entropy described
above, the program also uses recurrent entropy sources, information from
which is collected every 5 seconds as long as the program works. The
list of functions used, and approximate value of amount of random bits
receive from them, is presented in the table below.

Function Minimum entropy Maximum entropy Inaccessible to local attacker
GetDesktopWindow 0 1 0
GetForegroundWindow 2 4 2
GetShellWindow 0 1 0
GetCapture 0 ? ?
GetClipboardOwner 0 ? ?
GetOpenClipboardWindow 0 ? ?
GetCurrentProcessId 0 1 0
GetCurrentThreadId 0 1 0
GetFocus 0 1 ?
GetActiveWindow 0 ? ?
GetKBCodePage 0 ? ?
GetCursor 0 2 ?
GetLastActivePopup 0 ? ?
GetProcessHeap 0 2 0
GetQueueStatus 0 2 0
GetInputState 0 ? ?
GetMessageTime 0 ? ?
GetOEMCP 0 ? ?
GetCursorInfo 8 16 8
GetCaretPos 0 ? ?
GetThreadTimes 4 8 0
GetProcessTimes 4 8 0
GetProcessMemoryInfo 4 8 0
QueryPerformanceCounter 8 16 8
GlobalMemoryStatusEx 4 8 0
EnumWindows 128 256 128
Total: 162 335 146

In total we have 146 bits, that are guaranteed to be inaccessible even
to a local attacker. That is not too much, but nevertheless, already
lies beyond the capabilities of attackers having modern day technology.

Fast PRNG for generation of large volume of random numbers

The aforementioned RNG is designed to generate small amount of random
numbers with a large strength reserve. This scheme's efficiency,
however, is quite low and therefore it is unsuited for generation of
gigabytes of random numbers that are needed for the algorithms to
securely wipe disk sectors on encryption. For this purpose DiskCryptor
has the pseudorandom number generator that is built with the utmost
simplicity, but nonetheless its durability is based on the strength of
the AES.

DiskCryptor's pseudorandom number generator

Here in this scheme, the main RNG is used as the source of entropy, and
with its help a random AES key is generated on the first initialization
of the generator. Output of pseudorandom numbers is done by the process
of encryption, using this key, of sequentially incrementing 128 bit
counter. Statistical characteristics of output data are fully ensured by
the AES algorithm. On the completion of generator's work, the key is
destroyed, and it becomes impossible to distinguish an obtained data
from a random data. This scheme has the property of being simple and
sufficiently safe for its purpose.

List of publications used

  1. Peter Gutmann, "Software Generation of Practically Strong Random
    Numbers
    ",
    1998
  2. Elaine Barker, John Kelsey "NIST Special
    Publication 800-90
    ",
    National Institute of Standards and Technology, March 2007
  3. John Kelsey, Bruce Schneier, David Wagner, Chris Hall,
    "Cryptanalytic Attacks on Pseudorandom Number
    Generators
    ",
    Springer-Verlag London, UK, 1998