Random123
|
The Random123 library is a collection of counter-based random number generators (CBRNGs) for CPUs (C and C++) and GPUs (CUDA and OpenCL), as described in Parallel Random Numbers: As Easy as 1, 2, 3, Salmon, Moraes, Dror & Shaw, SC11, Seattle, Washington, USA, 2011, ACM . They are intended for use in statistical applications and Monte Carlo simulation and have passed all of the rigorous SmallCrush, Crush and BigCrush tests in the extensive TestU01 suite of statistical tests for random number generators. They are not suitable for use in cryptography or security even though they are constructed using principles drawn from cryptography.
The Random123 library is implemented entirely in header files. See below, for how to install and use the library, and how to generate documentation with doxygen.
CBRNGs are as fast as, or faster than conventional RNGs, much easier to parallelize, use minimal memory/cache resources, and require very little code. On modern architectures, the Random123 CBRNGs require a few cycles per byte of random data returned and return random data in convenient sizes (arrays of two or four elements, each of which is an unsigned integer of 32 or 64 bits). The range of random numbers is the full representable range of the 32 or 64 bit unsigned integer) The <Random123/uniform.h>
header contains utility functions to convert 32- and 64-bit unsigned integers to open or closed ranges of single or double precision floating point numbers.
The Random123 library was written by John Salmon and Mark Moraes. It is available at https://github.com/DEShawResearch/random123 with documentation at https://deshawresearch.github.io/random123. Archived releases are also available from http://deshawresearch.com/resources_random123.html. Please see the LICENSE for terms and conditions.
Unlike conventional RNGs, counter-based RNGs are stateless functions (or function classes i.e. functors) whose arguments are a counter, and a key that return a result of the same type as the counter.
result = CBRNGname(counter, key)
The returned result is a deterministic function of the key and counter, i.e. a unique (counter, key) tuple will always produce the same result. The result is highly sensitive to small changes in the inputs, so that the sequence of values produced by simply incrementing the counter (or key) is effectively indistinguishable from a sequence of samples of a uniformly distributed random variable.
For all the CBRNGs in the Random123 library, the result and counter are the same type, specifically an array of N words, where words have a width of W bits, encapsulated in r123arrayNxW structs, or equivalently, for C++, in the ArrayNxW typedefs in the r123 namespace. Keys are usually also arrayMxW types, but sometimes M is a different size than the counter N (e.g. Philox keys have half the number of elements as the counter, Threefry and ARS have the same number, AES uses an opaque key type rather than an array) The N random numbers returned in result.v[]
are unsigned integers of width W (32 or 64), and the range of the random numbers is the full range of the unsigned integer of that width (i.e. 0 to 2^W-1)
In C++, all public names (classes, structs, typedefs, etc) are in the r123
namespace. In C, the public names (functions, enums, structs, typedefs) begin either with r123
or with one of the RNG family names, e.g., threefry
, philox
, ars
, aesni
. The RNG functions themselves have names like philox4x32
. C++ class names are capitalized, e.g., Threefry4x32
.
Several families of CBRNGs are available in this version of the library:
The Random123 library is implemented entirely in header files. Thus, there is nothing to compile before using it and nothing to link after you #include
it in your source files. Simply direct your C or C++ compiler to find the header files in the include/
directory of the cloned repo and use the Random123 header files, types, and functions in your application.
Users and packagers are STRONGLY ADVISED run make check
to compile and run the tests in tests/
before using Random123 in an application (see tests/README). Do not use the library if any tests fail. (It is not a failure for a test to report that it cannot run because of missing hardware capabilities like 64bit multiply, SSE, AES-NI or compiler capabilities)
The top-level GNUmakefile also has "install", "html", and "install-html" targets. The former will copy header files to $(DESTDIR)$(includedir) (default: /usr/local/include). The second will run doxygen, replacing anything in docs/html. The last will install the documentation in $(DESTDIR)$(docdir)/html (default: /usr/local/doc/Random123/html).
A typical C++ use case might look like:
#include <Random123/philox.h> typedef r123::Philox4x32 RNG; RNG rng; RNG::ctr_type c={{}}; RNG::ukey_type uk={{}}; uk[0] = ???; // some user_supplied_seed RNG::key_type k=uk; for(...){ c[0] = ???; // some loop-dependent application variable c[1] = ???; // another loop-dependent application variable RNG::ctr_type r = rng(c, k); // use the random values in r for some operation related to // this iteration on objectid }
On each iteration, r
contains an array of 4 32-bit random values that will not be repeated by any other call to rng
as long as c
and k
are not reused.
In the example above, we use the r123::Philox4x32, but any of the other CBRNGs would serve equally well. Also note that for most CBRNGs, the ukey_type
and the key_type
are identical; the code could just as well ignore the ukey_type
and directly construct the key_type
. However, for the AESNI CBRNGs, the key_type
is opaque, and must be constructed from a ukey_type
, as shown.
In C, the example above could be written as:
#include <Random123/philox.h> philox4x32_ctr_t c={{}}; philox4x32_ukey_t uk={{}}; uk.v[0] = user_supplied_seed; philox4x32_key_t k = philox4x32keyinit(uk); for(...){ c.v[0] = ???; /* some loop-dependent application variable */ c.v[1] = ???; /* another loop-dependent application variable */ philox4x32_ctr_t r = philox4x32(c, k); }
In C, access to the contents of the counter and key is through the fixed-size array member v
.
All relevant functions in the C and C++ APIs for Random123 are declared as CUDA device functions if they are included in a CUDA kernel source file and compiled with a CUDA compiler (nvcc). They can be used exactly as described/documented for regular C or C++ programs. It is now possible to use Random123 functions in both the host portion and the device portion of the same .cu source file. The Nx32 forms were faster than the Nx64 variants on 32-bit GPU architectures in 2011, but we haven't measured this recently.
It has been reported that Random123 uses 16 bytes of static memory per thread. This is undesirable and not intentional, but we do not have a workaround other than to suggest adjusting memory allocation accordingly.
The pi_cuda.cu and pi_cudapp.cu examples illustrate the use of CUDA.
In a machine with different GPUs, the R123EXAMPLE_ENVCONF_CUDA_DEVICE environment variable can be set to a unique substring of the CUDA GPU device name to select a specific GPU (else examples try to choose the GPU with the most cores)
The functions in the Random123 C API can all be used in OpenCL kernels, just as in regular C functions. As with CUDA, the Nx32 forms are faster than the Nx64 variants on current (2011) 32-bit GPU architectures.
The pi_opencl.c
and pi_opencl_kernel.ocl
examples illustrate the use of OpenCL.
In a machine with different OpenCL devices, the R123EXAMPLE_ENVCONF_OPENCL_DEVICE environment variable can be set to a unique substring of the OpenCL device name to select a specific OpenCL device (else examples try to choose the device with the most cores)
In addition to the stateless ("pure/functional") C++ API above, the Random123 package includes two C++ classes that leverage the C++11 <random> API.
In addition to the stateless ("pure/functional") C API above, the Random123 package includes two C adapter interfaces to the GNU Scientific Library (GSL).
The Random123 library provides generators for uniformly distributed random integers. Often, applications want random real values or samples from other distributions. The general problem of generating samples from arbitrary distributions is beyond the scope of the Random123 library. One can, of course, use GSL or MicroURNG and the distributions in the C++11 <random> library, but a few simple cases are common enough that all that extra machinery seems like overkill. We have included a few generic conversion utilities which developers may find useful.
The Box-Muller method of generating Gaussian random variables is particularly well suited to Random123 because it deterministically consumes exactly two uniform randoms to generate exactly two gaussian randoms. It uses math library functions: sincos, log and sqrt which may be slow on some platforms, but which are surprisingly fast on others. Notably, on GPUs, the lack of branching in the Box-Muller method and hardware support for math functions overcomes the transcendental function overhead, making it the fastest generator of Gaussians that we are aware of.
The examples/ directory, contains example code intended to illustrate use of the library.
Complete, short programs estimate pi by counting the number of random points that fall inside a circle inscribed in a square, demonstrating the C, C++, AES, GSL, OpenCL, CUDA and C++11 APIs. The environment variable R123EXAMPLE_ENVCONF_SEED can be set to any unsigned integer value to run the example with a different seed. Many of the pi_* examples run different numbers of iterations if that number is specified as the first argument on the command line.
The tests/ directory contains tests and benchmarks. This code is complicated due to the fact that it is largely "single source" for CUDA, OpenCL and CPU implmentations. Developers are strongly discouraged from emulating its style. It contains:
Although we have done our best to make Random123 portable and standards conforming, it is an unfortunate fact that there is no portable code. There is only code that has been ported.
Prior to release, we test Random123 on a variety of systems and with a variety of toolchains that are readily available to us. Our current test environment includes:
In the past, we have tested Random123 with additional toolchains and hardware. Although we no longer test on these platforms, we know of no reason that they should not work.
Others have reported success on
With some compilation options, the CUDA nvcc compiler warns about unreachable code in array.h. The compiler doesn't recognize that the code that is unreachable for some values of some macro parameters, is actually reachable for other values of the parameters. It is possible to disable that particular warning for a specific compilation unit by adding -Wcudafe –diag_suppress=111 to the compilation command line.
On our ARMv7 test platform, we suspect a compiler bug with -O3, which does not seem to affect Random123 code itself, but produces nondeterministic results from time_serial. The offending compiler version was "aarch64-fsl-linux-gcc (Linaro GCC 4.8-2014.04) 4.8.3 20140401 (prerelease)". We slightly reordered a couple of innocuous statements (a timer() and dprintf() call) in time_serial to avoid the bug, but we would avoid -O3 on ARMv7 with that particular version of the compiler at least.
We welcome bug reports, bug fixes, ports, general feedback and other enhancements to the issues and pull requests pages of our github repo.
We are grateful for contributed bug-fixes and portability enhancements from the following users: