To random number generator page

This report is about the testing and application of hardware (true) random number generators (very old, but links updated, July 2013).

A hardware (true) random number generator is a piece of electronics that
plugs into a computer and produces genuine random numbers as opposed to the
*pseudo-random* numbers that are produced by a computer program such as
newran. The usual method is to amplify noise
generated by a resistor (Johnson noise) or a semi-conductor diode and feed this
to a comparator or Schmitt trigger. If you sample the output (not too quickly)
you (hope to) get a series of bits which are statistically independent. These
can be assembled into bytes, integers or floating point numbers and then, if
necessary, into random numbers from other distributions using methods such as
those in newran.

I had experimented with one of these from a Canadian company and found it to produce reasonably satisfactory numbers - I now know how to make it produce very satisfactory numbers - subject to a problem I'll go over later. I have also tested generators from Colorado and Holland and found them to produce very satisfactory numbers. I have tested a prototype device from Sweden which will produce satisfactory numbers with suitable software.

These use a formula to generate numbers which behave very like genuine random numbers and are widely used for simulations of random processes and statistical methods. In most cases a good pseudo-random number generator seems to work as you would expect a genuine random generator to work. For a suite of programs for testing pseudo-random number generators and details of some pseudo-random number generators see George Marsaglia's Diehard tests. See also Taygeta Scientific's notes on random number generators and the numerical analysis FAQ list.

I have always been somewhat suspicious of computer generated
*pseudo-random* numbers. Modern statistical methods seem to be requiring
huge numbers of random numbers and are using them in more demanding ways. And,
of course, faster computers mean we are able to use vastly more than we could a
few years ago. I am thinking particularly of Markov Chain Monte Carlo and
bootstrap as methods that can use a very large number of random numbers.
Simulation of time-series that we are going to test for long-range dependence
or chaos are examples where one might expect pseudo-random number generators to
give misleading results. Perhaps, add neural network simulations to this list.
In my 1987 paper on Hurst effect in *Biometrika* I generate a random
process by generating its Fourier transform with the pseudo-random number
generator and then taking the inverse Fourier transform to get the actual
process. Can one really expect the pseudo-random number generator to look like
a real random number generator in these kinds of situations?

In fact, pseudo-random number generators do seem to work well. But I would still like an alternative, at least to do a cross-check.

Lotteries, such as lotto or raffles, used to be drawn using a physical device such as a container containing numbered balls from which balls are drawn (hopefully) at random. Now, the New Zealand Lotteries Commission is moving towards using computer based systems to simulate the container containing numbered balls.

There seem to be four requirements for the computer based draw:

- The probability of the set of numbers drawn by the computers should be the
same as those that they would have using numbered balls drawn perfectly
randomly from a container (what statistics books call an
*urn*); - No one (including the programmer or person certifying the method) should be able to predict the numbers to be drawn;
- No one should be able to influence the outcome of the draw to their advantage;
- Some
*qualified*person has to be able to*certify*that the first three conditions are satisfied.

Certification of the process of the method is required by law (in New Zealand). Exactly what it means, or what is really required is still unclear to me. I interpret it to mean being able to show that the first three conditions are satisfied to a reasonable accuracy and beyond reasonable doubt.

I think these conditions really rule out the use of a pseudo-random generator, even when one uses some random starting number. With a pseudo-random number generator with only 2**31 possible values, there aren't enough possibilities to generate every possible combination of balls to be drawn in the game I am currently working with. If I used a modern random number generator with vastly more possible values, it would still be difficult to prove that each possible combination of balls had probability close to the theoretical probability. And it would be very difficult (impossible?) to get a starting random number with a sufficient number of possibilities.

One could argue that the pseudo-random numbers are so like real random numbers that it doesn't matter that not all combinations of numbers are possible. However, it might be possible for someone with access to the source code to identify the more likely combinations.

Owing to the unsatisfactory nature of pseudo-random number generators for lottery draws I have been encouraging our Lotteries Commission to use a hardware random number generator.

The testing procedure can then concentrate on the raw hardware random numbers. The testing on the final combinations of numbers simulating the draw from the urn would be aimed at just checking that the program was correct. This is as opposed to checking directly that condition 1 was satisfied, which can't be done in a satisfactory way. Of course, conditions 2 and 3 are satisfied almost automatically.

Recently, George Marsaglia, the well-known random number guru, produced a
CD-ROM containing 600
megabytes of random numbers. These were produced using George's best
pseudo-random number generators, but were then combined bytes from a variety of
random sources or *semi-random* sources (such as rap music).

Suppose X and Y are independent random bytes (integer values 0 to 255), and at least one of them is uniformly distributed over the values 0 to 255. Then both the bitwise exclusive-or of X and Y and X+Y mod 256 are uniformly distributed over 0 to 255. In addition if both X and Y are approximately uniformly distributed, then the combination will be more closely uniformly distributed. I think George uses exclusive-or.

In the Marsaglia CD-ROM the idea is to get the excellent properties of the pseudo-random number generator but to break up any remaining patterns with the random or semi-random generators.

George provides output from three hardware generators on the CD-ROM. He
identified his hardware generators as Canada, Germany and California according
to the place of manufacture. He says that none of these outputs pass his
*Diehard* tests. The failures were spectacular. I found this surprising as
my hardware generator, which is from the same manufacturer as George's Canadian
one, passed most of the Diehard tests when it was run at the recommended
sampling rate.

It seemed to me that there were four possibilities:

- George was using a faulty generator;
- The sampling rate was too fast (leading to correlation between successive numbers);
- George's card, while working correctly was rather more biased than the ones I have seen;
- There was a programming error in the program used for extracting the numbers from the generator.

It turns out that the problem was 4: there is a programming error.

On a PC (at least on the compiler I use) when you write a program to write
bytes to disk, you must open the file you are writing to as *binary*.
Otherwise, whenever you attempt to write a byte containing the binary
representation of 10, instead you get a byte containing 13 followed by a byte
containing 10. The program thinks you are trying to write an end-of-line and
the convention on a PC is to represent this by a *carriage return* (13)
followed by a *line feed* (10).

It is clear when one looks at the data that this is what has happened with the files from the Canadian and German generators. Every byte containing a 10 is preceded by a byte containing a 13. After clearing out the surplus bytes containing the 13s, the numbers from the Canadian generator pass most of the Diehard tests. (An alternative explanation is that the files were copied from a Unix computer to a PC in ascii mode, which leads to a similar effect).

The following is my examination of the numbers from the three hardware generators using the outputs on the CD-ROM.

This consists of 8 units which generate a series of bits (0 or 1) using Johnson noise (thermal noise from a resistor) according to the documentation. When a byte is requested from the generator the 8 units are sampled and the bits from these units are combined to form the byte. To examine the generator it is best to look at the streams of individual bits, since this will enable one to pick out the most likely deviations from perfect randomness.

The first thing to check is the average fraction of bits that equal 1 for each of the generator units. This should be very close to 0.5. After clearing out the surplus bytes from George's data we have 9,960,687 bytes. Calculating the fraction of 1-s from each unit:

bit fraction 0 0.4980 1 0.4977 2 0.4989 3 0.4978 4 0.4977 5 0.4983 6 0.4984 7 0.4981

These are close to 0.5 but significantly different from 0.5 since the standard error is 0.00016. It is probably to be expected that a physical generator would be slightly biased. This bias probably accounts for the occasional highly significant deviation in the Diehard tests.

If one samples too fast, successive bits from a unit in the generator will tend to be correlated since the amplifier and comparator in the generator will have only limited bandwidth. So it is important to check the low order auto-correlations. For George's sample there did not appear to be any significant low order auto-correlation (I haven't carried out a formal test on the set of auto-correlations: most are below two standard errors and there are the occasional high figures but there did not appear to be more than you would expect from a set of series with uncorrelated values).

One should also check cross-correlations between the bits in case the random noise from one generator unit contaminates the others or there is a common source of interference. For the Canadian generator there was no problem here.

One should also test for long term drift and periodicities. A spectral
analysis (see the spectrum for
one of the bits, this is a 19K GIF showing the spectrum moving more or less
randomly between the upper and lower bounds) on the full sequence didn't reveal
any problems. [*Spectral analysis* is in the sense that a time-series
analyst would understand the term as opposed to the spectral test used by Knuth
in the second volume of the *Art of Computer Programming*.]

So apart from the slight bias the Canadian generator seems fine. I suggest taking the exclusive-or of two successive bytes and using this combination instead of individual bytes. This will very substantially reduce the bias (and any cross-correlation) at the expense of taking twice as long to generate a byte.

This appears to be similar to the Canadian one. But its performance was terrible. After purging the surplus 13s there were 9,953,865 bytes.

Here are the means:

bit fraction 0 0.494 1 0.623 2 0.490 3 0.493 4 0.466 5 0.493 6 0.575 7 0.490

Here are the first, second and 50th auto-correlations:

0 1 2 3 4 5 6 7 -.004 .048 .000 -.003 -.005 -.001 .007 .001 -.001 .001 .000 -.002 -.004 .000 -.032 .000 .000 .005 .000 .001 .002 .000 .013 .000

Here are the cross-correlations between the 8 bits that form each byte (I give just the upper triangle):

1.000 .003 -.016 -.005 -.001 .162 .003 -.027 .939 .005 -.001 .016 .004 .238 .005 1.000 -.001 .009 -.004 .002 .141 1.000 -.002 -.017 .001 -.001 .995 -.001 .011 -.004 1.000 .002 -.016 .977 .003 1.000

Strictly speaking these are the auto and cross-covariances divided the theoretical variance = 0.25 if the bits were exactly unbiased. This is why the diagonal terms are not exactly equal to 1 for bits 1, 4 and 6.

The standard error of these correlations is .00032, so we have been able to detect numerous significant correlations. Even if we assume generator units 1 and 6 are broken there are still many significant auto- and cross-correlations.

Further spectral analysis (see the spectrum for bit 1, this is a 6K GIF showing a large peak dominating everything else) has revealed a strong cyclic effect effect at approximately, but not exactly, four times the sampling rate. My guess is that the noise generators are insufficiently shielded and are picking up interference from the computer electronics.

This appears to be differently organised but I haven't managed to deduce its structure.

The means do not differ significantly from 0.5 and the auto-covariances do not differ significantly from zero (using all 10,000,000 bytes). So at the first glance it seems excellent.

The problem arises when we look at the cross-correlations. I show just the diagonal elements and the highly significant cross-correlations in the upper triangle.

1 -.01 1 -.01 1 -.01 1 -.01 1 -.01 1 -.01 1 1

The cross correlations with lag 1 are even more curious. Again I show just the diagonal elements and the highly significant cross-correlations.

.000 .001 -.03 -.03 -.005 -.003 .000 .002 -.03 -.03 -.003 .000 .001 -.03 -.03 .000 .002 -.03 -.03 .000 -.03 .001 .000 .001 .000 .002 -.03 -.03 -.001 -.003 -.001 -.006 .000

I have been unable to come up with a good explanation of what is happening. Probably something clever has been done to reduce the bias and the auto-correlations. But at the expense of producing unacceptable cross-correlations.

I think you need to go through the testing procedure I have described here. It is not sufficient just to run the Diehard tests. These are not optimised to find the kinds of problems which hardware generators are likely to be subject to. In particular one could easily miss seeing the kind of bias present in the Canadian card. And they don't provide any guidance on how a card needs to be improved.

On the other hand you do need some general test such as the Diehard tests. My original analysis of the Canadian data, before I recognised the programming error, showed up a small but significant auto and cross-correlations. I would have missed these with a smaller sample (say 100,000 bytes) and as it was, I was almost prepared to put these down to minor limitations of the card. But the rejection is spectacular when you run the Diehard tests - but it is also if you just do a histogram of the values of the bytes.

Returning to the discussion of the Canadian generator. We have been trying random number generators from the same manufacturer. We have a total of 6. One is working correctly. Four of the others have at least one failed generator unit. Another has one unit with a grossly unsatisfactory correlation structure (that unit has now failed). I have had one generator for a number of years. One unit failed after one year. We bought 4 more at the end of 1995 (or was it 1994?). Units in these were either dead when we first tried them or died shortly after. Our latest acquisition had 4 dead units out of the 8.

It seems like an assembly problem and should be able to be fixed. The manufacturer is supposed to be investigating the problem and I hope they can fix the problem as their generator seems potentially to be a good product.

It may be that we have been just unlucky in the generators we received. But it is more likely that many of the generators produced by this company are faulty. I wonder whether the methods you usually use to check electronic equipment don't work for random number generators. If you just check that it is producing a stream of numbers, you will find that it is, even if one or two units have failed. And the numbers look OK if the highest and lowest generator units are working.

We bought 4 more at the end of 1997. So far we have tested 3 of these and they seem fine.

Of course it depends what you want it for.

George Marsaglia recommends combining each byte from the hardware generator with a byte from a pseudo-random number generator to reduce bias and correlation. I agree with this. But I would still like my original real random numbers to be a good as possible. I don't know why pseudo-random number generators work as well as they do. I suspect that even the most reliable ones will give misleading answers for some applications. When one is dealing with a very large simulation I don't know whether some undesirable characteristic of the pseudo-random number generator will be able to leak through if the hardware generator is not very good. So I would really like my hardware random number generator to be as good as possible and the pseudo-random number generator to be used to clean up the very small amount of correlation or bias that remains in the output from the hardware random number generator.

This was my immediate application. The game emulates drawing numbered
*balls* from an *urn*. We want each permutation of *balls* to
have the same probability to, say, within 10%. This is a very stringent
requirement since a minor bias is the generator can produce major differences
in the probabilities of the different permutations of the balls. Using the
Canadian generator as the base generator we get the required accuracy if we
exclusive-or two bytes. I also exclusive-or the resulting number with a number
from a pseudo-random number generator. Including the pseudo-random number
generator may be not really necessary, but it gives us some protection against
a generator unit failing during a game. It makes it impossible for anyone to
take advantage of the little bit of variation left in the probabilities of the
different permutations.

I don't think it is satisfactory to use a lower quality hardware random number generator in this situation, even though one is taking an exclusive-or with a pseudo-random number. Even with the Canadian one I decided that it was necessary to take the exclusive-or of two of its bytes to reduce bias. Although the numbers would probably meet the requirements 1 to 3 with a less accurate generator, the problem is to prove this under requirement 4.

Taking an exclusive-or with a pseudo-random number must give a generator
which is very close to being an exact method of generating real random bytes
which are uniformly independently distributed. I am assuming, or course, that
one is using a generator of the quality of a correctly working Canadian random
number generator. It is probably *overkill* to exclusive-or two of the
bytes from the hardware generator; the pseudo-random number generator will
remove the effects of the little bit of bias.

The main problem is speed. The Canadian manufacturer recommended a maximum sampling rate of 20,000 bytes per second and this seems to be about right. This is too slow for practical computer simulations. However, one could install the card in an older, pensioned off, PC, and just leave the generator running, writing the numbers to disk. This way one could generate 1 to 2 gigabytes a day which could be read from disk as required.

You want a key of, say, 512 bits. You won't want to use pseudo-random
numbers since then the key is really just your starting random seed (assuming
the person trying to break your code has access to your program). It might be
reasonable to use a real random number generator to generate the key. For 56
bit keys, maybe there are simpler ways. On the other hand, it is very important
that these keys really are chosen randomly so that someone searching through
keys can't just limit the search to *most likely values*, and a hardware
random number generator may provide an easy way to do this.

For other information related to encryption see David Wagner's page and RFC 1750.

I know of a manufacturers in Colorado, Holland and Sweden. I report here on tests on these devices.

This plugs into a parallel printer port of a PC. It draws its power from the data lines of the port. It has only one generator unit. It xors 4 successive bits from the unit, assembles the resulting bits into nibbles (4 bits) so 16 bits from the generator unit are used to build one nibble. The nibble is passed to the computer via the status bits in the parallel port. It comes with software which tests the device and can provide normal or uniform random numbers. For my tests I used my own software for accessing the numbers. I assembled the nibbles into bytes and then carried out the tests that I described above and also the Diehard tests. It passed all the tests and I was unable to detect in deviations from perfect randomness.

The main problem is speed, 2,500 bytes per second. This is probably fine for lottery and encryption purposes, but is likely to be too slow when used directly for statistical simulations. For simulations one would want to have the generator writing random bytes to disk which could be read as required, as I suggested in the section on statistical simulation. The manufacturer says they are about to release a much faster version, which would make the device much more suitable for statistical simulations.

This seems to be the only generator specifically designed for statistical purposes and where the manufacturer has made a real effort to understand the effect of bias and correlation on the numbers that are finally produced.

This plugs into the serial port and can be read at 9600 baud giving around 1000 bytes per second. The documentation says it contains two independent generators. I suspect the results are xored to produce the bits that we see. There may be other processing - one would have to talk to the manufacturers or take the device to pieces to figure this out (and one should do this before using it for critical applications). It passes all my tests and the Diehard tests. It was designed for randomising para-psychology experiments but would be suitable for lotteries or encryption. Again, speed is the problem for statistical simulations.

This also plugs into the serial port and is intended to be a lower cost
generator to be used for encryption. It is a fairly simple device with all
processing carried out on the host computer. It can be sampled at a variety of
baud rates. I prefer this arrangement since I can examine the characteristics
of basic noise generator such as bias and correlation structure. On my
prototype (a second beta version), one is just beginning to see correlation
between adjacent bits when sampling at 19200 baud, using my software. Look at
the manufacturer's web page to see details on how it works. With suitable
software I think it is suitable for key generation and statistical simulation
(although speed will be a problem again). The manufacturer supplies software
for Windows 95 and Windows NT. This software combines the results with those of
a pseudo-random number generator so it almost automatically passes all
statistical tests, but I couldn't do much analysis on the
*unpredictability* of the numbers. With my own software, sampling at
115200 and xoring 8 bytes, it passed all my tests and the Diehard tests.
Possibly xoring a lesser number of bytes would be OK. On my newest computer I
found I needed to sample at 57600 baud or faster or the achieved sampling rate
was unexpectedly slow. I think this was because at slower speeds the serial
port rejects much of the output of the generator as *noise* (which is
what it is, of course).

Here are the web sites of the manufacturers of the devices discussed in this article (links updated July, 2013):

- Canada http://www.tundra.com/ (No longer in production)
- Colorado http://www.comscire.com/
- Holland http://tams-www.informatik.uni-hamburg.de/paper/2001/SA_Witt_Hartmann/cdrom/Internetseiten/valley.interact.nl/RNG/home.html
- Sweden - http://www.trng98.se/serial_trng_9803.html

See my links page for other manufacturers.