To random number generator page
This report is about the testing and application of hardware (true) random number generators.
A hardware (true) random number generator is a piece of electronics that plugs into a computer and produces genuine random numbers as opposed to the pseudo-random numbers that are produced by a computer program such as newran. The usual method is to amplify noise generated by a resistor (Johnson noise) or a semi-conductor diode and feed this to a comparator or Schmitt trigger. If you sample the output (not too quickly) you (hope to) get a series of bits which are statistically independent. These can be assembled into bytes, integers or floating point numbers and then, if necessary, into random numbers from other distributions using methods such as those in newran.
I had experimented with one of these from a Canadian company and found it to produce reasonably satisfactory numbers - I now know how to make it produce very satisfactory numbers - subject to a problem I'll go over later. I have also tested generators from Colorado and Holland and found them to produce very satisfactory numbers. I have tested a prototype device from Sweden which will produce satisfactory numbers with suitable software.
These use a formula to generate numbers which behave very like genuine random numbers and are widely used for simulations of random processes and statistical methods. In most cases a good pseudo-random number generator seems to work as you would expect a genuine random generator to work. For a suite of programs for testing pseudo-random number generators and details of some pseudo-random number generators see George Marsaglia's Diehard tests. See also Taygeta Scientific's notes on random number generators and the numerical analysis FAQ list.
I have always been somewhat suspicious of computer generated pseudo-random numbers. Modern statistical methods seem to be requiring huge numbers of random numbers and are using them in more demanding ways. And, of course, faster computers mean we are able to use vastly more than we could a few years ago. I am thinking particularly of Markov Chain Monte Carlo and bootstrap as methods that can use a very large number of random numbers. Simulation of time-series that we are going to test for long-range dependence or chaos are examples where one might expect pseudo-random number generators to give misleading results. Perhaps, add neural network simulations to this list. In my 1987 paper on Hurst effect in Biometrika I generate a random process by generating its Fourier transform with the pseudo-random number generator and then taking the inverse Fourier transform to get the actual process. Can one really expect the pseudo-random number generator to look like a real random number generator in these kinds of situations?
In fact, pseudo-random number generators do seem to work well. But I would still like an alternative, at least to do a cross-check.
Lotteries, such as lotto or raffles, used to be drawn using a physical device such as a container containing numbered balls from which balls are drawn (hopefully) at random. Now, the New Zealand Lotteries Commission is moving towards using computer based systems to simulate the container containing numbered balls.
There seem to be four requirements for the computer based draw:
Certification of the process of the method is required by law (in New Zealand). Exactly what it means, or what is really required is still unclear to me. I interpret it to mean being able to show that the first three conditions are satisfied to a reasonable accuracy and beyond reasonable doubt.
I think these conditions really rule out the use of a pseudo-random generator, even when one uses some random starting number. With a pseudo-random number generator with only 2**31 possible values, there aren't enough possibilities to generate every possible combination of balls to be drawn in the game I am currently working with. If I used a modern random number generator with vastly more possible values, it would still be difficult to prove that each possible combination of balls had probability close to the theoretical probability. And it would be very difficult (impossible?) to get a starting random number with a sufficient number of possibilities.
One could argue that the pseudo-random numbers are so like real random numbers that it doesn't matter that not all combinations of numbers are possible. However, it might be possible for someone with access to the source code to identify the more likely combinations.
Owing to the unsatisfactory nature of pseudo-random number generators for lottery draws I have been encouraging our Lotteries Commission to use a hardware random number generator.
The testing procedure can then concentrate on the raw hardware random numbers. The testing on the final combinations of numbers simulating the draw from the urn would be aimed at just checking that the program was correct. This is as opposed to checking directly that condition 1 was satisfied, which can't be done in a satisfactory way. Of course, conditions 2 and 3 are satisfied almost automatically.
Recently, George Marsaglia, the well-known random number guru, produced a CD-ROM containing 600 megabytes of random numbers. These were produced using George's best pseudo-random number generators, but were then combined bytes from a variety of random sources or semi-random sources (such as rap music).
Suppose X and Y are independent random bytes (integer values 0 to 255), and at least one of them is uniformly distributed over the values 0 to 255. Then both the bitwise exclusive-or of X and Y and X+Y mod 256 are uniformly distributed over 0 to 255. In addition if both X and Y are approximately uniformly distributed, then the combination will be more closely uniformly distributed. I think George uses exclusive-or.
In the Marsaglia CD-ROM the idea is to get the excellent properties of the pseudo-random number generator but to break up any remaining patterns with the random or semi-random generators.
George provides output from three hardware generators on the CD-ROM. He identified his hardware generators as Canada, Germany and California according to the place of manufacture. He says that none of these outputs pass his Diehard tests. The failures were spectacular. I found this surprising as my hardware generator, which is from the same manufacturer as George's Canadian one, passed most of the Diehard tests when it was run at the recommended sampling rate.
It seemed to me that there were four possibilities:
It turns out that the problem was 4: there is a programming error.
On a PC (at least on the compiler I use) when you write a program to write bytes to disk, you must open the file you are writing to as binary. Otherwise, whenever you attempt to write a byte containing the binary representation of 10, instead you get a byte containing 13 followed by a byte containing 10. The program thinks you are trying to write an end-of-line and the convention on a PC is to represent this by a carriage return (13) followed by a line feed (10).
It is clear when one looks at the data that this is what has happened with the files from the Canadian and German generators. Every byte containing a 10 is preceded by a byte containing a 13. After clearing out the surplus bytes containing the 13s, the numbers from the Canadian generator pass most of the Diehard tests. (An alternative explanation is that the files were copied from a Unix computer to a PC in ascii mode, which leads to a similar effect).
The following is my examination of the numbers from the three hardware generators using the outputs on the CD-ROM.
This consists of 8 units which generate a series of bits (0 or 1) using Johnson noise (thermal noise from a resistor) according to the documentation. When a byte is requested from the generator the 8 units are sampled and the bits from these units are combined to form the byte. To examine the generator it is best to look at the streams of individual bits, since this will enable one to pick out the most likely deviations from perfect randomness.
The first thing to check is the average fraction of bits that equal 1 for each of the generator units. This should be very close to 0.5. After clearing out the surplus bytes from George's data we have 9,960,687 bytes. Calculating the fraction of 1-s from each unit:
bit fraction 0 0.4980 1 0.4977 2 0.4989 3 0.4978 4 0.4977 5 0.4983 6 0.4984 7 0.4981
These are close to 0.5 but significantly different from 0.5 since the standard error is 0.00016. It is probably to be expected that a physical generator would be slightly biased. This bias probably accounts for the occasional highly significant deviation in the Diehard tests.
If one samples too fast, successive bits from a unit in the generator will tend to be correlated since the amplifier and comparator in the generator will have only limited bandwidth. So it is important to check the low order auto-correlations. For George's sample there did not appear to be any significant low order auto-correlation (I haven't carried out a formal test on the set of auto-correlations: most are below two standard errors and there are the occasional high figures but there did not appear to be more than you would expect from a set of series with uncorrelated values).
One should also check cross-correlations between the bits in case the random noise from one generator unit contaminates the others or there is a common source of interference. For the Canadian generator there was no problem here.
One should also test for long term drift and periodicities. A spectral analysis (see the spectrum for one of the bits, this is a 19K GIF showing the spectrum moving more or less randomly between the upper and lower bounds) on the full sequence didn't reveal any problems. [Spectral analysis is in the sense that a time-series analyst would understand the term as opposed to the spectral test used by Knuth in the second volume of the Art of Computer Programming.]
So apart from the slight bias the Canadian generator seems fine. I suggest taking the exclusive-or of two successive bytes and using this combination instead of individual bytes. This will very substantially reduce the bias (and any cross-correlation) at the expense of taking twice as long to generate a byte.
This appears to be similar to the Canadian one. But its performance was terrible. After purging the surplus 13s there were 9,953,865 bytes.
Here are the means:
bit fraction 0 0.494 1 0.623 2 0.490 3 0.493 4 0.466 5 0.493 6 0.575 7 0.490
Here are the first, second and 50th auto-correlations:
0 1 2 3 4 5 6 7 -.004 .048 .000 -.003 -.005 -.001 .007 .001 -.001 .001 .000 -.002 -.004 .000 -.032 .000 .000 .005 .000 .001 .002 .000 .013 .000
Here are the cross-correlations between the 8 bits that form each byte (I give just the upper triangle):
1.000 .003 -.016 -.005 -.001 .162 .003 -.027 .939 .005 -.001 .016 .004 .238 .005 1.000 -.001 .009 -.004 .002 .141 1.000 -.002 -.017 .001 -.001 .995 -.001 .011 -.004 1.000 .002 -.016 .977 .003 1.000
Strictly speaking these are the auto and cross-covariances divided the theoretical variance = 0.25 if the bits were exactly unbiased. This is why the diagonal terms are not exactly equal to 1 for bits 1, 4 and 6.
The standard error of these correlations is .00032, so we have been able to detect numerous significant correlations. Even if we assume generator units 1 and 6 are broken there are still many significant auto- and cross-correlations.
Further spectral analysis (see the spectrum for bit 1, this is a 6K GIF showing a large peak dominating everything else) has revealed a strong cyclic effect effect at approximately, but not exactly, four times the sampling rate. My guess is that the noise generators are insufficiently shielded and are picking up interference from the computer electronics.
This appears to be differently organised but I haven't managed to deduce its structure.
The means do not differ significantly from 0.5 and the auto-covariances do not differ significantly from zero (using all 10,000,000 bytes). So at the first glance it seems excellent.
The problem arises when we look at the cross-correlations. I show just the diagonal elements and the highly significant cross-correlations in the upper triangle.
1 -.01 1 -.01 1 -.01 1 -.01 1 -.01 1 -.01 1 1
The cross correlations with lag 1 are even more curious. Again I show just the diagonal elements and the highly significant cross-correlations.
.000 .001 -.03 -.03 -.005 -.003 .000 .002 -.03 -.03 -.003 .000 .001 -.03 -.03 .000 .002 -.03 -.03 .000 -.03 .001 .000 .001 .000 .002 -.03 -.03 -.001 -.003 -.001 -.006 .000
I have been unable to come up with a good explanation of what is happening. Probably something clever has been done to reduce the bias and the auto-correlations. But at the expense of producing unacceptable cross-correlations.
I think you need to go through the testing procedure I have described here. It is not sufficient just to run the Diehard tests. These are not optimised to find the kinds of problems which hardware generators are likely to be subject to. In particular one could easily miss seeing the kind of bias present in the Canadian card. And they don't provide any guidance on how a card needs to be improved.
On the other hand you do need some general test such as the Diehard tests. My original analysis of the Canadian data, before I recognised the programming error, showed up a small but significant auto and cross-correlations. I would have missed these with a smaller sample (say 100,000 bytes) and as it was, I was almost prepared to put these down to minor limitations of the card. But the rejection is spectacular when you run the Diehard tests - but it is also if you just do a histogram of the values of the bytes.
Returning to the discussion of the Canadian generator. We have been trying random number generators from the same manufacturer. We have a total of 6. One is working correctly. Four of the others have at least one failed generator unit. Another has one unit with a grossly unsatisfactory correlation structure (that unit has now failed). I have had one generator for a number of years. One unit failed after one year. We bought 4 more at the end of 1995 (or was it 1994?). Units in these were either dead when we first tried them or died shortly after. Our latest acquisition had 4 dead units out of the 8.
It seems like an assembly problem and should be able to be fixed. The manufacturer is supposed to be investigating the problem and I hope they can fix the problem as their generator seems potentially to be a good product.
It may be that we have been just unlucky in the generators we received. But it is more likely that many of the generators produced by this company are faulty. I wonder whether the methods you usually use to check electronic equipment don't work for random number generators. If you just check that it is producing a stream of numbers, you will find that it is, even if one or two units have failed. And the numbers look OK if the highest and lowest generator units are working.
We bought 4 more at the end of 1997. So far we have tested 3 of these and they seem fine.
Of course it depends what you want it for.
George Marsaglia recommends combining each byte from the hardware generator with a byte from a pseudo-random number generator to reduce bias and correlation. I agree with this. But I would still like my original real random numbers to be a good as possible. I don't know why pseudo-random number generators work as well as they do. I suspect that even the most reliable ones will give misleading answers for some applications. When one is dealing with a very large simulation I don't know whether some undesirable characteristic of the pseudo-random number generator will be able to leak through if the hardware generator is not very good. So I would really like my hardware random number generator to be as good as possible and the pseudo-random number generator to be used to clean up the very small amount of correlation or bias that remains in the output from the hardware random number generator.
This was my immediate application. The game emulates drawing numbered balls from an urn. We want each permutation of balls to have the same probability to, say, within 10%. This is a very stringent requirement since a minor bias is the generator can produce major differences in the probabilities of the different permutations of the balls. Using the Canadian generator as the base generator we get the required accuracy if we exclusive-or two bytes. I also exclusive-or the resulting number with a number from a pseudo-random number generator. Including the pseudo-random number generator may be not really necessary, but it gives us some protection against a generator unit failing during a game. It makes it impossible for anyone to take advantage of the little bit of variation left in the probabilities of the different permutations.
I don't think it is satisfactory to use a lower quality hardware random number generator in this situation, even though one is taking an exclusive-or with a pseudo-random number. Even with the Canadian one I decided that it was necessary to take the exclusive-or of two of its bytes to reduce bias. Although the numbers would probably meet the requirements 1 to 3 with a less accurate generator, the problem is to prove this under requirement 4.
Taking an exclusive-or with a pseudo-random number must give a generator which is very close to being an exact method of generating real random bytes which are uniformly independently distributed. I am assuming, or course, that one is using a generator of the quality of a correctly working Canadian random number generator. It is probably overkill to exclusive-or two of the bytes from the hardware generator; the pseudo-random number generator will remove the effects of the little bit of bias.
The main problem is speed. The Canadian manufacturer recommended a maximum sampling rate of 20,000 bytes per second and this seems to be about right. This is too slow for practical computer simulations. However, one could install the card in an older, pensioned off, PC, and just leave the generator running, writing the numbers to disk. This way one could generate 1 to 2 gigabytes a day which could be read from disk as required.
You want a key of, say, 512 bits. You won't want to use pseudo-random numbers since then the key is really just your starting random seed (assuming the person trying to break your code has access to your program). It might be reasonable to use a real random number generator to generate the key. For 56 bit keys, maybe there are simpler ways. On the other hand, it is very important that these keys really are chosen randomly so that someone searching through keys can't just limit the search to most likely values, and a hardware random number generator may provide an easy way to do this.
For other information related to encryption see David Wagner's page and RFC 1750.
I know of a manufacturers in Colorado, Holland and Sweden. I report here on tests on these devices.
This plugs into a parallel printer port of a PC. It draws its power from the data lines of the port. It has only one generator unit. It xors 4 successive bits from the unit, assembles the resulting bits into nibbles (4 bits) so 16 bits from the generator unit are used to build one nibble. The nibble is passed to the computer via the status bits in the parallel port. It comes with software which tests the device and can provide normal or uniform random numbers. For my tests I used my own software for accessing the numbers. I assembled the nibbles into bytes and then carried out the tests that I described above and also the Diehard tests. It passed all the tests and I was unable to detect in deviations from perfect randomness.
The main problem is speed, 2,500 bytes per second. This is probably fine for lottery and encryption purposes, but is likely to be too slow when used directly for statistical simulations. For simulations one would want to have the generator writing random bytes to disk which could be read as required, as I suggested in the section on statistical simulation. The manufacturer says they are about to release a much faster version, which would make the device much more suitable for statistical simulations.
This seems to be the only generator specifically designed for statistical purposes and where the manufacturer has made a real effort to understand the effect of bias and correlation on the numbers that are finally produced.
This plugs into the serial port and can be read at 9600 baud giving around 1000 bytes per second. The documentation says it contains two independent generators. I suspect the results are xored to produce the bits that we see. There may be other processing - one would have to talk to the manufacturers or take the device to pieces to figure this out (and one should do this before using it for critical applications). It passes all my tests and the Diehard tests. It was designed for randomising para-psychology experiments but would be suitable for lotteries or encryption. Again, speed is the problem for statistical simulations.
This also plugs into the serial port and is intended to be a lower cost generator to be used for encryption. It is a fairly simple device with all processing carried out on the host computer. It can be sampled at a variety of baud rates. I prefer this arrangement since I can examine the characteristics of basic noise generator such as bias and correlation structure. On my prototype (a second beta version), one is just beginning to see correlation between adjacent bits when sampling at 19200 baud, using my software. Look at the manufacturer's web page to see details on how it works. With suitable software I think it is suitable for key generation and statistical simulation (although speed will be a problem again). The manufacturer supplies software for Windows 95 and Windows NT. This software combines the results with those of a pseudo-random number generator so it almost automatically passes all statistical tests, but I couldn't do much analysis on the unpredictability of the numbers. With my own software, sampling at 115200 and xoring 8 bytes, it passed all my tests and the Diehard tests. Possibly xoring a lesser number of bytes would be OK. On my newest computer I found I needed to sample at 57600 baud or faster or the achieved sampling rate was unexpectedly slow. I think this was because at slower speeds the serial port rejects much of the output of the generator as noise (which is what it is, of course).
Here are the web sites of the manufacturers of the devices discussed in this article:
See my links page for other manufacturers.
Top of page
To random number generator page