On the quality of random
Warning: This article is still an ongoing work (updated Jul 6th 2018)
How to measure random quality ?
I came accross a technical and theoretical challenge recently: how to evaluate the quality of a True Random Number Generator (TRNG) ? (Hopefully others had the same question, see stackoverflow and references at the end of this article).
A TRNG can have many implementation, in my case it was a small piece of a computer chip built around free running oscillators (FROs) which has been integrated to provide random numbers. As the source of entropy (namly quantum effects and what not, but do not ask me I am not an expert) is not deterministic such a generator is called "True" compared to DRBG / DRNG: deterministic random number generators which look like random but are in fact fully deterministic. Basically if you know their inner state (which is much more reduced that the quantom states of a few transistors) you know the next value that will come out, and the one after that and so on and so fourth.
I had not reason not to trust the IP vendor which provided the TRNG, nor the front end engineer which integrated it in our SoC nor the backend engineer which P&R the IP and check DRC enforcement. But still it would have been good to verify at the end that the number coming from the IP where "truly" random. But How do you do that ?
How to test random number generator "quality" ?
The NIST has an answer: a series of tests (aimed at DRBG: link & link) which ensure with a certain level of certainty that some bits are random, i.e. the outcome of the generation could not have been predicted. That the thing with RNG generator, you can only evaluate randomness afterward once the generation is done, for example by computing statistics on the generated number and check if they "look" random enough. Some tests are pretty simple some are more complex but they are not perfect.
By definition strong encryption / hash should be indistinguishable from pure random, also it is likely to be predicatble, as said before if ones know the internal state of the generator and the generation algorithm (see differences between PRNG and TRNG).
No test is absolute: the only thing a test can show is that a source is definitely not random but one can never be sure a source is truly random.
Testing /dev/urandom using dieharder:
cat /dev/urandom | dieharder -a -g 200
And the first lines of the output (on my laptop):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#=============================================================================#
# dieharder version 3.31.1 Copyright 2003 Robert G. Brown #
#=============================================================================#
rng_name |rands/second| Seed |
stdin_input_raw| 2.86e+07 |1906170125|
#=============================================================================#
test_name |ntup| tsamples |psamples| p-value |Assessment
#=============================================================================#
diehard_birthdays| 0| 100| 100|0.04063257| PASSED
diehard_operm5| 0| 1000000| 100|0.39003783| PASSED
diehard_rank_32x32| 0| 40000| 100|0.29253977| PASSED
diehard_rank_6x8| 0| 100000| 100|0.54463760| PASSED
diehard_bitstream| 0| 2097152| 100|0.08717072| PASSED
diehard_opso| 0| 2097152| 100|0.17956788| PASSED
diehard_oqso| 0| 2097152| 100|0.23600778| PASSED
diehard_dna| 0| 2097152| 100|0.56596361| PASSED
diehard_count_1s_str| 0| 256000| 100|0.11492636| PASSED
diehard_count_1s_byt| 0| 256000| 100|0.38479922| PASSED
diehard_parking_lot| 0| 12000| 100|0.12488862| PASSED
diehard_2dsphere| 2| 8000| 100|0.05409247| PASSED
diehard_3dsphere| 3| 4000| 100|0.48664732| PASSED
diehard_squeeze| 0| 100000| 100|0.99872432| WEAK
Notice the "Weak" line 22 ?
Post on reddit about interpreting diehard results: link
Differences between PRNG and TRNG ?
(unpredictability)
What is an entropy pool ?
How fast can you generate Random Numbers ?
Intel entropy source is around 3Gbs (to be refined by conditionner)
Can good entropy sources be expected to fail some ENT / Dieharder tests ?
Indeed, as indicated in this stackoverflow responses: Perform ENT / Dieharder and raw entropy or on whitened entropy ?
Certifications for Random Number Generators:
NIST SP800-90
A. Recommendation for RNG using DRBG (NIST webpage)
B. Recommendation for Entropy Source for RBG (NIST webpage)
C. Recommendation for RBG Construction (NIST Draft)
References:
Wikipedia's page https://en.wikipedia.org/wiki/Randomness_tests
Ensuring quality of numbers from TRNG (pdf)
Statistical testing of random number generators (NIST, pdf)
Quality measures for various pseudo random number generators (website)
Random number generators for parallel computers (pdf)
Wikipedia's page on Diehard tests (link)
https://stackoverflow.com/questions/43050347/how-to-test-pseudorandom-sequence-of-bits-with-dieharder
Dieharder's webpage (link)
Intel's Digital Random Number Generator Implementation Guide
French agency opinion on AIS-31 (in french, document)
Stackoverflow question: How does the kernel entropy pool work ?
LWN article: On entropy and randomness
Linux's kernel random.c source code
Wikipedia's definition of entropy (web)
Understanding and managing entropy (pdf)
Good practice in (Pseudo) Random Number Generation (pdf)
Stackoverflow's question: Perform ENT / Dieharder and raw entropy or on whitened entropy ?
An other stackoverflow question: Differences between methods of hardware random number generators
ent utility to "evaluate" entropy http://www.fourmilab.ch/random/