Why is the RAND function better than RANUNI and RANNOR?
The traditional random number generating functions in SAS – RANUNI, RANNOR and the rest, as well as UNIFORM and NORMAL – are now deprecated, and the advice is to use the relatively new RAND function instead. Why is this?
Here first is an example of the use of the traditional functions.
Two streams of random numbers are being generated here, R1 being "uniformly" distributed and R2 "normally" distributed. The seed values – here 3 and 7, arbitrarily – are positive, which means that the numbers generated will be pseudo-random. That is, exactly the same sequences of “random” values would be generated every time the program was run. A seed that was zero or negative would generate a truly random sequence based on the system time, so that different random sequences would be generated every time the program was run.
Random number sequences generated in this way are good but not perfect. They perform pretty well against standard tests of randomness, but there is a theoretical possibility of two sequences overlapping (so that the random variables would not be statistically independent). These functions are all using excerpts from the same extremely long random sequence, and starting at a different point within it, depending on the seed value. The sequence runs to at least 264 values, so the probability of overlap is in practice small enough not to be a worry for most purposes.
In general, there is no need to rush out and alter existing code that uses the old random number functions. SAS Institute do warn that those functions are not suitable for use with parallel and distributed processing, but in most other cases old code can be safely left as it is.
For new code, RAND is better. Here is an example.
The first parameter to the RAND function – and the only one specified here – is the name of the distribution to be followed by the random values. All of the distributions for which there were “old” random number functions are supported.
Notice that the RAND calls do not specify a seed. The equivalent is done by the single call to STREAMINIT at the beginning. Note that:
- Without any STREAMINIT call, the RAND functions would generate truly random values, so that the results would not be reproducible.
- A positive seed value specified to STREAMINIT (as here) will generate a single pseudorandom sequence, which will be used by all subsequent RAND calls. For example, in the code above "uniformly"-distributed values and "Normally"-distributed values will be generated alternately from successive values of the one random sequence.
- That sequence will not be the same sequence as was produced by RANUNI(3) above.
The benefits of using the RAND function are:
- Improved performance against tests of randomness, and the possibility of “overlap” is virtually eliminated.
- It is suitable for use with parallel and distributed processing (see CALL STREAM below).
- A greater variety of distributions is supported, including F, T, LogNormal, Pareto, Erlang etc.
- Parameters to the distribution can be specified e.g. for “Normal”, the mean and standard deviation, whereas the old RANNOR function could only generate values with mean 0 and standard deviation 1.
CALL STREAMINIT permits a choice of random number generators. The default, suitable for most purposes, is called “MTHybrid” and described as “Hybrid 1998/2002 32-bit Mersenne twister”. Others available are described as 64-bit, “Threefry”, “Threefish” etc. (Perhaps with chips?)
There are two related CALL routines, neither of which will normally be required:
- CALL STREAM enables multiple independent streams to be generated. This would be appropriate for a program running in parallel on multiple nodes of a grid, or in a data step to be run multiple times by a macro.
- CALL STREAMREWIND restores a random number generator to its initial state. This amounts to, in effect, deliberately introducing “overlap” as described above. For examples of when this might be a useful thing to do, please refer to the SAS Institute documentation.