Using dqrng as user-supplied RNG

R
RNG
package
Published

September 24, 2023

My dqrng package has some quite old issues, one is “More distribution functions” where I brought forward the idea to support additional distribution functions within dqrng, which currently only supports uniform, normal and exponential distributions. I still think this would be a good idea, but it would also be nice if one could simply plug into the large number of distribution functions that have been implemented for R already. Fortunately this is possible via the mechanism described in User-supplied Random Number Generation. In #67 I have implemented this. Let’s see how that works by comparing the performance of runif, rnorm, rexp, and sample.int1 with their counterparts from dqrng with different settings for user-supplied RNGs.

When comparing the default methods from R with those from dqrng we see a consistent performance advantage for the latter with a factor of about 5 for larger samples:

bp1 <- bench::press(
  n = 10^(0:5),
  dist = c("runif", "rnorm", "rexp", "sample.int"),
  {
    dqdist <- paste0("dq", dist)
    bench::mark(
                  base1 = eval(call(dist, n)),
                  dqrng1 = eval(call(dqdist, n)),
                  check = FALSE,
                  time_unit = "s"
    )
  }
) |>
  mutate(label = as.factor(attr(expression, "description")))
ggplot(bp1, aes(x = n, y = median, color = label)) + 
  geom_line() + scale_x_log10() + scale_y_log10() + facet_wrap(vars(dist))

When we enable the RNG from dqrng for the uniform distribution things change for all three distribution functions. For smaller samples of less than 100 draws, the base methods now have comparable performance. Unfortunately, there is not much change for sample.int. For larger samples dqrng still has the edge in all cases, though:

RNGkind("user")

When we also enable the Ziggurat algorithm for normal draws, one sees a nice speedup in rnorm:

RNGkind("user", "user")

We can also see this when computing the relative speedup of dqrng compared with base R. Enabling the RNG from dqrng for the uniform distribution (“relative2”) brings the base methods on par with their dqrng counterparts for small samples.Also enabling the Ziggurat method for rnorm brings some improvements for larger samples. Unfortunately, there is not much change for sample.int:

But does this help with making more distribution functions available for people using dqrng? Yes! Internally all the distribution functions in R make use the uniform and the normal distribution. And when we replace those with the variants from dqrng, this also influences these distributions. But first of all, we see that we can set the seed with the normal set.seed and get the same reproducible numbers from base or dqrng methods:

set.seed(42); rnorm(5)
[1] -1.3679777 -0.7638514 -1.6173858 -0.3507472  0.5683508
set.seed(42); dqrnorm(5)
[1] -1.3679777 -0.7638514 -1.6173858 -0.3507472  0.5683508

The same is true if you use dqset.seed. However, the even with the same input seed a different output is created, since R does some scrambling on the seed before using it. Maybe I should revert that:

dqset.seed(42); rnorm(5)
[1] -1.3679777 -0.7638514 -1.6173858 -0.3507472  0.5683508
dqset.seed(42); dqrnorm(5)
[1] -1.3679777 -0.7638514 -1.6173858 -0.3507472  0.5683508

But how do we know that this also works for other distributions? We can simply try it out. Do we get reproducible numbers from various distribution functions after using dqset.seed? Yes:

dqset.seed(42); rlnorm(5)
[1] 0.2546214 0.4658687 0.1984167 0.7041617 1.7653532
dqset.seed(42); rlnorm(5)
[1] 0.2546214 0.4658687 0.1984167 0.7041617 1.7653532
dqset.seed(42); rt(5, 10)
[1] -1.7585953 -0.3260428 -0.3052283  0.2594040 -1.9871566
dqset.seed(42); rt(5, 10)
[1] -1.7585953 -0.3260428 -0.3052283  0.2594040 -1.9871566

Overall, adding

library(dqrng)
RNGkind("user", "user")

at the top of a script before (potentially) setting the seed with set.seed will give high performance for any random draws without further changes to the code. By replacing runif, rnorm, rexp, and sample.int with their counterparts from dqrng one can gain even more, in particular for larger samples. And maybe in the future more distribution functions will be added to dqrng it sself.