`(69069 * 2783094533) %% 2^32`

`[1] 1`

The main reason for this was an “undefined behaviour” error found on CRAN’s UBSAN checks:

`dqrng.cpp:222:18: runtime error: signed integer overflow: 4025630150 * 2783094533 cannot be represented in type 'long int'`

Looking at the relevant lines the error is quite apparent:

```
Int32 unscramble(Int32 u) {
for (int j = 0; j < 50; ++j) {
u = ((u - 1) * 2783094533);
}
return u;
}
```

While `u`

is a `Int32`

, i.e. `unsigned long int`

, the literal integer `2783094533`

is interpreted as a *signed* `long int`

. As a consequence, the multiplication is done using signed integer logic where overflow is undefined. As a fix, we can simply make sure that the `2783094533`

is also interpreted as `unsigned long int`

:

```
Int32 unscramble(Int32 u) {
for (int j = 0; j < 50; ++j) {
u = ((u - 1) * 2783094533UL);
}
return u;
}
```

In case you are wondering why this code is there in the first place: As discussed on StackOverflow, R scrambles the user supplied seed using

```
/* Initial scrambling */
for(j = 0; j < 50; j++)
seed = (69069 * seed + 1);
```

which can be undone using the modular multiplicative inverse of `69069`

for which holds:

`(69069 * 2783094533) %% 2^32`

`[1] 1`

The purpose of undoing this scrambling is to make the results of `set.seed()`

and `dqset.seed()`

equivalent if one uses `dqrng`

as user defined RNG.

The second change was triggered by a bug report from Sergey Fedorov from the MacPorts project. When building `dqrng`

on a 32bit PowerPC architecture certain parts of the included PCG code where now used that are not compatible with this architecture. For one of these issues a fix was possible. But in the end I had to disable PCG on MacOS for PowerPC. It will be interesting to see if other architectures are affected by this as well. Probably the best way to find out is when the Debian package is updated.

I had already mentioned the changes in a previous post. There is one important change, though: I reverted the breaking change concerning the two argument constructor and `seed`

function from PCG. While it is still true that those have surprising properties, this is not really relevant for the current use-case. Here it is important that the new method `dqrng::random_64bit_generator::clone(stream)`

produces an identical RNG for `clone == 0`

, which can be implemented without influencing the two argument constructor and `seed`

function.

What I like about this release is that different streams of work are coming together quite nicely. For example, Henrik Sloot implemented to ability to access the global RNG within C++ code. In addition, the abstract base class `random_64bit_generator`

now supports methods `variate<dist>(param)`

, `generate<dist>(container, param)`

etc. using and inspired by `randutils`

. These make it super easy to create new `dqr<dist>`

functions if needed, e.g.:

```
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <boost/random/gamma_distribution.hpp>
#include <dqrng.h>
// [[Rcpp::export]]
Rcpp::NumericVector dqrgamma(std::size_t n, double shape, double scale) {
auto rng = dqrng::random_64bit_accessor{};
auto out = Rcpp::NumericVector(Rcpp::no_init(n));
rng.generate<boost::random::gamma_distribution>(out, shape, scale);
return out;
}
```

Since this new function makes use of the global RNG, it is controlled via the same `dqset.seed()`

as normal functions from the package:

```
dqset.seed(20240515)
dqrgamma(5, 2, 10)
```

`[1] 30.17319 44.74696 12.02194 16.64893 17.34264`

`dqrnorm(5)`

`[1] 0.28924215 -0.87322364 0.91917936 -0.83256856 0.01148021`

```
dqset.seed(20240515)
dqrgamma(5, 2, 10)
```

`[1] 30.17319 44.74696 12.02194 16.64893 17.34264`

`dqrnorm(5)`

`[1] 0.28924215 -0.87322364 0.91917936 -0.83256856 0.01148021`

It should not come as a surprise that `boost::random::gamma_distribution()`

produces similarly distributed random variates as the function from base R:

```
n <- 1e6
shape <- 2
scale <- 10
data.frame(
gamma_R = rgamma(n, shape, scale = scale),
gamma_dq = dqrgamma(n, shape, scale)) |>
pivot_longer(cols = starts_with("gamma")) |>
ggplot(aes(x = value, fill = name)) + geom_histogram(position = "dodge2")
```

``stat_bin()` using `bins = 30`. Pick better value with `binwidth`.`

Unfortunately, the Gamma distribution implemented in `boost.random`

is not very fast:

```
bm <- bench::mark(gamma_R = rgamma(n, shape, scale = scale),
gamma_dq = dqrgamma(n, shape, scale),
check = FALSE)
knitr::kable(bm[, 1:5])
```

expression | min | median | itr/sec | mem_alloc |
---|---|---|---|---|

gamma_R | 133ms | 139ms | 7.162025 | 7.63MB |

gamma_dq | 159ms | 171ms | 5.814996 | 7.63MB |

It might make sense to implement for example the method from Marsaglia and Tsang (2000).

Alternatively, one can also make use of a feature that I had discussed in another post. It is now possible to register dqrng as user-supplied RNG within R. This way one gets the already fast uniform and normal distribution functions from dqrng together with all the distribution functions available for R. This can make for a very fast combination:

```
register_methods()
bm <- bench::mark(gamma_R = rgamma(n, shape, scale = scale),
gamma_dq = dqrgamma(n, shape, scale),
check = FALSE)
knitr::kable(bm[, 1:5])
```

expression | min | median | itr/sec | mem_alloc |
---|---|---|---|---|

gamma_R | 96.6ms | 97.7ms | 10.209660 | 7.63MB |

gamma_dq | 156.8ms | 157.9ms | 6.337992 | 7.63MB |

Marsaglia, George, and Wai Wan Tsang. 2000. “A Simple Method for Generating Gamma Variables.” *ACM Transactions on Mathematical Software* 26 (3): 363372. https://doi.org/10.1145/358407.358414.

The default RNG has changed from Xoroshiro128+ to Xoroshiro128++. The older generators Xoroshiro128+ and Xoshiro256+ are still available but should only be used for backward compatibility or for generating floating point numbers, i.e. not sampling etc. More details about this change plus some (inconclusive) benchmarks can be found in a previous post (#57 fixing #56).

One of the new features is the ability to access the global RNG directly. This requires passing a pointer to the calling program, which is done via an “external pointer” wrapped as

`Rcpp::XPtr`

. It therefore made sense to change the`dqrng::rng64_t`

type used for storing the RNG internally to also use`Rcpp::XPtr`

instead of`std::shared_ptr`

. The functions from`dqrng_sample.h`

now expect a reference to`dqrng::random_64bit_generator`

instead of`dqrng::rng64_t`

(#70 fixing #63).The two argument constructor and

`seed`

function from PCG has surprising properties: it is not identical to the one argument version followed by`set_stream(stream)`

. For consistency with the new`clone(stream)`

method, the two argument versions are no longer used. This influences code that uses multiple streams with PCG together with the tooling from this package, e.g. the example code in the vignette on parallel RNG usage. In addition, setting the stream on PCG64 via`dqset.seed(seed, stream)`

or at the C++ level using the interface provided by dqrng will be relative to the current stream, i.e. setting`stream=0`

will not change the RNG. This is for consistency with the other provided RNGs. You still get the standard behavior if you are using the C++ classes for PCG directly.

Decoupled the ‘sitmo’ package. It is now possible to use, e.g., the distribution functions from the header-only library without having an explicit

`LinkingTo: sitmo`

.Make the internal RNG accessible from the outside had been a plan for quite some time, since this should simplify the development of new functionality using the C++ interface. I am grateful to Henrik Sloot for implementing this feature together with the class

`dqrng::random_64bit_accessor`

. This class supports UniformRandomBitGenerator and can therefore be used together with any C++11 distribtion function. In addition, the methods from the abstract parent class`random_64bit_generator`

are inherited (fixing #41 in #58).As discussed in a previous post, Xoroshiro128**/++ and Xoshiro256**/++ have been added to

`xoshiro.h`

In another post I had allready discussed how uniform and normal distributions can be registered as user-supplied RNG within R. This happens automatically if the option

`dqrng.register_methods`

is set to`TRUE`

. With this change one can make use of**all distribution functions**available for R together with the faster RNGs from this package. While the additional function calls do cost a bit of performance compared with native distribution functions, one can still see a nice performance boost from the change of RNG.Add missing inline attributes and limit the included Rcpp headers in

`dqrng_types.h`

(#75 together with Paul Liétar)Sometimes it is useful to record the internal state of the RNG. In base R, one can use

`.Random.seed`

, but so far this was not possible with dqrng. To circumvent this, I/O methods for the RNG’s internal state have been added. These use character vectors, since the internal states are unsigned 64bit and 128bit numbers, which cannot be safely represented in R. (fixing #66 in #78)The abstract call

`random_64bit_generator`

has been extended with additional convenience methods. Most examples in the vignettes are now making use of these methods (fixing #64 in #79):- A
`clone(stream)`

method to ease parallel computation, e.g. using the global RNG. - New methods
`variate<dist>(param)`

,`generate<dist>(container, param)`

etc. using and inspired by`randutils`

.

- A
The scalar functions

`dqrng::runif`

,`dqrng::rnorm`

and`dqrng::rexp`

available from`dqrng.h`

have been deprecated and will be removed in a future release. Please use the more flexible and faster`dqrng::random_64bit_accessor`

together with`variate<Dist>()`

instead. The same applies to`dqrng::uniform01`

from`dqrng_distribution.h`

, which can be replaced by the member function`dqrng::random_64bit_generator::uniform01`

.A good discussion with Philippe Grosjean lead to a new template function

`dqrng::extra::parallel_generate`

in`dqrng_extra/parallel_generate.h`

as an example for using the global RNG in a parallel context (fixing #77 in #82). In addition, this function also shows how one can use parallel random numbers but get results that are independent of the amount of parallelism used.

To a large extend I have followed the procedure outlined by Andreas Handel, who in turn has build on previous work. Check the links provided there!

Styling is “work in progress” …

Besides migrating from Blogdown to Quarto, I have also changed hosting in deployment method. While I used to use GitHub + Netlify, I am now using Codeberg + Codeberg Pages.

Most of my projects are just redirects to appropriate source code repositories. This was directly supported in “Hugo Academic”. For Quarto, I am using a YAML header like:

`--- title: swephR abstract: High precission Swiss Ephemeris for R categories: - R - package - CRAN format: html: include-in-header: - text: | <meta http-equiv="refresh" content="0; url=https://rstub.github.io/swephR/" /> ---`

`WARN`

on CRAN. Recently the development version of R has started to take compiler warnings with respect to format errors more seriously, forcing a large number of CRAN maintainers to become active. In my case things where rather simple: For `dqrng`

I only needed to recreate `RcppExports.cpp`

with the latest `Rcpp`

version, c.f. https://github.com/RcppCore/Rcpp/issues/1287#issuecomment-1829886024, while `tikzDevice`

needed a minor fix in one error message.
]]>`dqrng`

, c.f. #72, for which also have to decide on the algorithm(s) to use for weighted sampling without replacement. Before looking at that I wanted to verify my decisions for the unweighted case.
Using the new header file `dqrng_sample.h`

from the currently released version v0.3.1 and the ability to access the global RNG from the current development version, it is easy to write functions that make use of the three provided algorithms: Partial Fisher-Yates shuffle, rejection sampling using a hash set and rejection sampling using a bit set:

```
#include <Rcpp.h>
// [[Rcpp::depends(dqrng)]]
// requires dqrng > v0.3.1
#include <dqrng.h>
#include <dqrng_sample.h>
// [[Rcpp::export]]
Rcpp::IntegerVector sample_shuffle(int n, int size) {
dqrng::random_64bit_accessor rng;
return dqrng::sample::no_replacement_shuffle<Rcpp::IntegerVector, uint32_t>
(rng, uint32_t(n), uint32_t(size), 1);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_hashset(int n, int size) {
dqrng::random_64bit_accessor rng;
using set_t = dqrng::minimal_hash_set<uint32_t>;
return dqrng::sample::no_replacement_set<Rcpp::IntegerVector, uint32_t, set_t>
(rng, uint32_t(n), uint32_t(size), 1);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_bitset(int n, int size) {
dqrng::random_64bit_accessor rng;
using set_t = dqrng::minimal_bit_set;
return dqrng::sample::no_replacement_set<Rcpp::IntegerVector, uint32_t, set_t>
(rng, uint32_t(n), uint32_t(size), 1);
}
```

Next we can benchmark these algorithms against each other and the implementation from R itself for different population sizes `n`

and selection ratios `r`

:

```
bp <- bench::press(
n = 10^(1:8),
r = c(0.7, 0.5, 10^-(1:4)),
{
size <- ceiling(r * n)
bench::mark(
sample.int(n, size),
sample_shuffle(n, size),
sample_hashset(n, size),
sample_bitset(n, size),
check = FALSE,
time_unit = "s"
)
}
) |> mutate(label = as.factor(attr(expression, "description")))
```

```
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
Warning: Some expressions had a GC in every iteration; so filtering is
disabled.
```

```
ggplot(bp, aes(x = n, y = median, color = label)) +
geom_line() + scale_x_log10() + scale_y_log10() + facet_wrap(vars(r))
```

We learn:

- The fastest method from
`dqrng`

is always faster than R itself. - The increased performance for R at
`n = 1e8`

with low selection ratio is triggered by switching to a hash table. R should do this much earlier. - For the three methods from
`dqrng`

we see:- For
`0.5 < r`

the partial Fisher-Yates shuffle is optimal - For
`0.001 < r < 0.5`

it is best to use rejection sampling using a bit set - For
`0.001 > r`

one should switch to rejection sampling using a hash set

- For

This is exactly how it is implemented in `dqrng::sample<VEC, INT>`

, which is quite reassuring.

`dqrng`

package has some quite old issues, one is “More distribution functions” where I brought forward the idea to support additional distribution functions within `dqrng`

, which currently only supports uniform, normal and exponential distributions. I still think this would be a good idea, but it would also be nice if one could simply plug into the large number of distribution functions that have been implemented for R already. Fortunately this is possible via the mechanism described in User-supplied Random Number Generation. In #67 I have implemented this. Let’s see how that works by comparing the performance of `runif`

, `rnorm`

, `rexp`

, and `sample.int`

`dqrng`

with different settings for user-supplied RNGs.
When comparing the default methods from R with those from `dqrng`

we see a consistent performance advantage for the latter with a factor of about 5 for larger samples:

```
bp1 <- bench::press(
n = 10^(0:5),
dist = c("runif", "rnorm", "rexp", "sample.int"),
{
dqdist <- paste0("dq", dist)
bench::mark(
base1 = eval(call(dist, n)),
dqrng1 = eval(call(dqdist, n)),
check = FALSE,
time_unit = "s"
)
}
) |>
mutate(label = as.factor(attr(expression, "description")))
ggplot(bp1, aes(x = n, y = median, color = label)) +
geom_line() + scale_x_log10() + scale_y_log10() + facet_wrap(vars(dist))
```

When we enable the RNG from `dqrng`

for the uniform distribution things change for all three distribution functions. For smaller samples of less than 100 draws, the base methods now have comparable performance. Unfortunately, there is not much change for `sample.int`

. For larger samples `dqrng`

still has the edge in all cases, though:

`RNGkind("user")`

When we also enable the Ziggurat algorithm for normal draws, one sees a nice speedup in `rnorm`

:

`RNGkind("user", "user")`

We can also see this when computing the relative speedup of `dqrng`

compared with base R. Enabling the RNG from `dqrng`

for the uniform distribution (“relative2”) brings the base methods on par with their `dqrng`

counterparts for small samples.Also enabling the Ziggurat method for `rnorm`

brings some improvements for larger samples. Unfortunately, there is not much change for `sample.int`

:

But does this help with making more distribution functions available for people using `dqrng`

? Yes! Internally all the distribution functions in R make use the uniform and the normal distribution. And when we replace those with the variants from `dqrng`

, this also influences these distributions. But first of all, we see that we can set the seed with the normal `set.seed`

and get the same reproducible numbers from base or `dqrng`

methods:

`set.seed(42); rnorm(5)`

`[1] -1.3679777 -0.7638514 -1.6173858 -0.3507472 0.5683508`

`set.seed(42); dqrnorm(5)`

`[1] -1.3679777 -0.7638514 -1.6173858 -0.3507472 0.5683508`

The same is true if you use `dqset.seed`

. However, the even with the same input seed a different output is created, since R does some scrambling on the seed before using it. Maybe I should revert that:

`dqset.seed(42); rnorm(5)`

`[1] -1.3679777 -0.7638514 -1.6173858 -0.3507472 0.5683508`

`dqset.seed(42); dqrnorm(5)`

`[1] -1.3679777 -0.7638514 -1.6173858 -0.3507472 0.5683508`

But how do we know that this also works for other distributions? We can simply try it out. Do we get reproducible numbers from various distribution functions after using `dqset.seed`

? Yes:

`dqset.seed(42); rlnorm(5)`

`[1] 0.2546214 0.4658687 0.1984167 0.7041617 1.7653532`

`dqset.seed(42); rlnorm(5)`

`[1] 0.2546214 0.4658687 0.1984167 0.7041617 1.7653532`

`dqset.seed(42); rt(5, 10)`

`[1] -1.7585953 -0.3260428 -0.3052283 0.2594040 -1.9871566`

`dqset.seed(42); rt(5, 10)`

`[1] -1.7585953 -0.3260428 -0.3052283 0.2594040 -1.9871566`

Overall, adding

```
library(dqrng)
RNGkind("user", "user")
```

at the top of a script before (potentially) setting the seed with `set.seed`

will give high performance for any random draws without further changes to the code. By replacing `runif`

, `rnorm`

, `rexp`

, and `sample.int`

with their counterparts from `dqrng`

one can gain even more, in particular for larger samples. And maybe in the future more distribution functions will be added to `dqrng`

it sself.

```
// [[Rcpp::depends(dqrng,BH,sitmo)]]
#include <Rcpp.h>
#include <dqrng_distribution.h>
auto rng = dqrng::generator<>(42);
// [[Rcpp::export]]
Rcpp::IntegerVector sample_prob(int size, Rcpp::NumericVector prob) {
Rcpp::IntegerVector result(Rcpp::no_init(size));
double max_prob = Rcpp::max(prob);
uint32_t n(prob.length());
std::generate(result.begin(), result.end(),
[n, prob, max_prob] () {
while (true) {
int index = (*rng)(n);
if (dqrng::uniform01((*rng)()) < prob[index] / max_prob)
return index + 1;
}
});
return result;
}
```

For relatively even weight distributions, as created by `runif(n)`

or `sample(n)`

, performance is good, especially for larger populations:

```
sample_R <- function (size, prob) {
sample.int(length(prob), size, replace = TRUE, prob)
}
size <- 1e4
prob10 <- sample(10)
prob100 <- sample(100)
prob1000 <- sample(1000)
bm <- bench::mark(
sample_R(size, prob10),
sample_prob(size, prob10),
sample_R(size, prob100),
sample_prob(size, prob100),
sample_R(size, prob1000),
sample_prob(size, prob1000),
check = FALSE
)
knitr::kable(bm[, 1:6])
```

expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|

sample_R(size, prob10) | 229µs | 256µs | 3692.370 | 41.6KB | 2.016587 |

sample_prob(size, prob10) | 316µs | 333µs | 2799.726 | 41.6KB | 2.017094 |

sample_R(size, prob100) | 428µs | 446µs | 2112.141 | 52.9KB | 2.015402 |

sample_prob(size, prob100) | 353µs | 367µs | 2534.683 | 47.6KB | 2.014851 |

sample_R(size, prob1000) | 486µs | 513µs | 1858.395 | 53.4KB | 2.015613 |

sample_prob(size, prob1000) | 360µs | 374µs | 2520.599 | 49.5KB | 2.016479 |

The nice performance breaks down when an uneven weight distribution is used. Here the largest element `n`

is replaced by `n * n`

, severely deteriorating the performance of the stochastic acceptance method:

```
size <- 1e4
prob10 <- sample(10)
prob10[which.max(prob10)] <- 10 * 10
prob100 <- sample(100)
prob100[which.max(prob100)] <- 100 * 100
prob1000 <- sample(1000)
prob1000[which.max(prob1000)] <- 1000 * 1000
bm <- bench::mark(
sample_R(size, prob10),
sample_prob(size, prob10),
sample_R(size, prob100),
sample_prob(size, prob100),
sample_R(size, prob1000),
sample_prob(size, prob1000),
check = FALSE
)
knitr::kable(bm[, 1:6])
```

expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|

sample_R(size, prob10) | 161.75µs | 169.13µs | 5501.152611 | 41.6KB | 4.073419 |

sample_prob(size, prob10) | 854.02µs | 906.83µs | 1060.873684 | 41.6KB | 0.000000 |

sample_R(size, prob100) | 238.76µs | 252.51µs | 3637.956539 | 42.9KB | 4.073860 |

sample_prob(size, prob100) | 7.08ms | 7.61ms | 130.633690 | 41.6KB | 0.000000 |

sample_R(size, prob1000) | 469.83µs | 484.27µs | 1919.669187 | 53.4KB | 2.016459 |

sample_prob(size, prob1000) | 71.98ms | 123.2ms | 9.646388 | 41.6KB | 0.000000 |

A good way to think about this was described by Keith Schwarz (2011).^{1} The stochastic acceptance method can be compared to randomly throwing a dart at a bar chart of the weight distribution. If the weight distribution is very uneven, there is a lot of empty space on the chart, i.e. one has to try very often to not hit the empty space. To quantify this, one can use `max_weight / average_weight`

, which is a measure for how many tries one needs before a throw is successful:

- This is 1 for un-weighted distribution.
- This is (around) 2 for a random or a linear weight distribution.
- This would be the number of elements in the extreme case where all weight is on one element.

The above page also discusses an alternative: The alias method originally suggested by Walker (1974, 1977) in the efficient formulation of Vose (1991), which is also used by R in certain cases. The general idea is to redistribute the weight from high weight items to an alias table associated with low weight items. Let’s implement it in C++:

```
#include <queue>
// [[Rcpp::depends(dqrng,BH,sitmo)]]
#include <Rcpp.h>
#include <dqrng_distribution.h>
auto rng = dqrng::generator<>(42);
// [[Rcpp::export]]
Rcpp::IntegerVector sample_alias(int size, Rcpp::NumericVector prob) {
uint32_t n(prob.size());
std::vector<int> alias(n);
Rcpp::NumericVector p = prob * n / Rcpp::sum(prob);
std::queue<int> high;
std::queue<int> low;
for(int i = 0; i < n; ++i) {
if (p[i] < 1.0)
low.push(i);
else
high.push(i);
}
while(!low.empty() && !high.empty()) {
int l = low.front();
low.pop();
int h = high.front();
alias[l] = h;
p[h] = (p[h] + p[l]) - 1.0;
if (p[h] < 1.0) {
low.push(h);
high.pop();
}
}
while (!low.empty()) {
p[low.front()] = 1.0;
low.pop();
}
while (!high.empty()) {
p[high.front()] = 1.0;
high.pop();
}
Rcpp::IntegerVector result(Rcpp::no_init(size));
std::generate(result.begin(), result.end(),
[n, p, alias] () {
int index = (*rng)(n);
if (dqrng::uniform01((*rng)()) < p[index])
return index + 1;
else
return alias[index] + 1;
});
return result;
}
```

First we need to make sure that all algorithms select the different possibilities with the same probabilities, which seems to be the case:

```
size <- 1e6
n <- 10
prob <- sample(n)
data.frame(
sample_R = sample_R(size, prob),
sample_prob = sample_prob(size, prob),
sample_alias = sample_alias(size, prob)
) |> pivot_longer(cols = starts_with("sample")) |>
ggplot(aes(x = value, fill = name)) + geom_bar(position = "dodge2")
```

Next we benchmark the three methods for a range of different population sizes `n`

and returned samples `size`

. First for a linear weight distribution:

```
bp1 <- bench::press(
n = 10^(1:4),
size = 10^(0:5),
{
prob <- sample(n)
bench::mark(
sample_R = sample_R(size, prob),
sample_prob = sample_prob(size, prob = prob),
sample_alias = sample_alias(size, prob = prob),
check = FALSE,
time_unit = "s"
)
}
) |>
mutate(label = as.factor(attr(expression, "description")))
ggplot(bp1, aes(x = n, y = median, color = label)) +
geom_line() + scale_x_log10() + scale_y_log10() + facet_wrap(vars(size))
```

For `n > size`

stochastic sampling seems to work still very well. But when many samples are created, the work done to even out the weights does pay off. This seems to give a good way to decide which method to use. And how about an uneven weight distribution?

```
bp2 <- bench::press(
n = 10^(1:4),
size = 10^(0:5),
{
prob <- sample(n)
prob[which.max(prob)] <- n * n
bench::mark(
sample_R = sample_R(size, prob),
sample_prob = sample_prob(size, prob = prob),
sample_alias = sample_alias(size, prob = prob),
check = FALSE,
time_unit = "s"
)
}
) |>
mutate(label = as.factor(attr(expression, "description")))
ggplot(bp2, aes(x = n, y = median, color = label)) +
geom_line() + scale_x_log10() + scale_y_log10() + facet_wrap(vars(size))
```

Here the alias method is the fastest as long as there are more than one element generated. But when is the weight distribution so uneven, that one should use the alias method (almost) everywhere? Further investigations are needed …

Keith Schwarz. 2011. “Darts, Dice, and Coins.” https://www.keithschwarz.com/darts-dice-coins/.

Lipowski, Adam, and Dorota Lipowska. 2012. “Roulette-Wheel Selection via Stochastic Acceptance.” *Physica A: Statistical Mechanics and Its Applications* 391 (6): 2193–96. https://doi.org/10.1016/j.physa.2011.12.004.

Vose, M. D. 1991. “A Linear Algorithm for Generating Random Numbers with a Given Distribution.” *IEEE Transactions on Software Engineering* 17 (9): 972–75. https://doi.org/10.1109/32.92917.

Walker, Alastair J. 1974. “New Fast Method for Generating Discrete Random Numbers with Arbitrary Frequency Distributions.” *Electronics Letters* 10 (8): 127. https://doi.org/10.1049/el:19740097.

———. 1977. “An Efficient Method for Generating Discrete Random Variables with General Distributions.” *ACM Transactions on Mathematical Software* 3 (3): 253–56. https://doi.org/10.1145/355744.355749.

`dqsample`

and `dqrrademacher`

make use of the full bit pattern. So it would be better to support the `**`

and/or `++`

variants for both RNGs and make one of them the default. This would be a breaking change, of course. In #57 I have added these additional 4 RNGs to `xoshiro.h`

so now is the time to do some benchmarking first by generating some random numbers:
```
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng_distribution.h>
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]
template<typename RNG>
double sum_rng(int n) {
auto rng = dqrng::generator<RNG>(42);
dqrng::uniform_distribution dist;
double result = 0.0;
for (int i = 0; i < n; ++i) {
result += dist(*rng);
}
return result;
}
// [[Rcpp::export]]
double sum_128plus(int n) {
return sum_rng<dqrng::xoroshiro128plus>(n);
}
// [[Rcpp::export]]
double sum_256plus(int n) {
return sum_rng<dqrng::xoshiro256plus>(n);
}
// [[Rcpp::export]]
double sum_128starstar(int n) {
return sum_rng<dqrng::xoroshiro128starstar>(n);
}
// [[Rcpp::export]]
double sum_256starstar(int n) {
return sum_rng<dqrng::xoshiro256starstar>(n);
}
// [[Rcpp::export]]
double sum_128plusplus(int n) {
return sum_rng<dqrng::xoroshiro128plusplus>(n);
}
// [[Rcpp::export]]
double sum_256plusplus(int n) {
return sum_rng<dqrng::xoshiro256plusplus>(n);
}
```

```
N <- 1e5
bm <- bench::mark(
sum_128plus(N),
sum_128starstar(N),
sum_128plusplus(N),
sum_256plus(N),
sum_256starstar(N),
sum_256plusplus(N),
check = FALSE,
min_time = 1
)
knitr::kable(bm[, 1:6])
```

expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|

sum_128plus(N) | 486µs | 504µs | 1868.621 | 2.49KB | 0 |

sum_128starstar(N) | 486µs | 501µs | 1877.453 | 2.49KB | 0 |

sum_128plusplus(N) | 486µs | 500µs | 1876.092 | 2.49KB | 0 |

sum_256plus(N) | 487µs | 500µs | 1865.577 | 2.49KB | 0 |

sum_256starstar(N) | 486µs | 503µs | 1851.012 | 2.49KB | 0 |

sum_256plusplus(N) | 486µs | 504µs | 1852.690 | 2.49KB | 0 |

`plot(bm)`

The current default xoroshiro128+ is fastest in this comparison with the other generators being very similar. Let’s try a more realistic usecase: generating many uniformaly distributed random numbers:

```
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng_distribution.h>
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]
template<typename RNG>
Rcpp::NumericVector runif_rng(int n) {
auto rng = dqrng::generator<RNG>(42);
dqrng::uniform_distribution dist;
Rcpp::NumericVector result(Rcpp::no_init(n));
std::generate(result.begin(), result.end(), [rng, dist] () {return dist(*rng);});
return result;
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_128plus(int n) {
return runif_rng<dqrng::xoroshiro128plus>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_256plus(int n) {
return runif_rng<dqrng::xoshiro256plus>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_128starstar(int n) {
return runif_rng<dqrng::xoroshiro128starstar>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_256starstar(int n) {
return runif_rng<dqrng::xoshiro256starstar>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_128plusplus(int n) {
return runif_rng<dqrng::xoroshiro128plusplus>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_256plusplus(int n) {
return runif_rng<dqrng::xoshiro256plusplus>(n);
}
```

```
N <- 1e5
bm <- bench::mark(
runif(N),
runif_128plus(N),
runif_128starstar(N),
runif_128plusplus(N),
runif_256plus(N),
runif_256starstar(N),
runif_256plusplus(N),
check = FALSE,
min_time = 1
)
knitr::kable(bm[, 1:6])
```

expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|

runif(N) | 2.97ms | 3.51ms | 278.908 | 786KB | 4.147331 |

runif_128plus(N) | 419.7µs | 777.16µs | 1284.184 | 784KB | 22.993440 |

runif_128starstar(N) | 420.05µs | 777.19µs | 1274.454 | 784KB | 21.034679 |

runif_128plusplus(N) | 419.83µs | 813.89µs | 1125.404 | 784KB | 18.206746 |

runif_256plus(N) | 489.21µs | 857.94µs | 1086.515 | 784KB | 18.089734 |

runif_256starstar(N) | 489.12µs | 851.94µs | 1168.742 | 784KB | 19.178194 |

runif_256plusplus(N) | 489.42µs | 862.5µs | 1102.899 | 784KB | 18.210929 |

`plot(bm)`

Here all six generators are very similar, with all of them clearly faster than R’s built in `runif`

. How about sampling with replacement, which is also mostly governed by the speed of generating random numbers:

```
#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng_sample.h>
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
Rcpp::IntegerVector sample_128plus(int m, int n) {
auto rng = dqrng::generator<dqrng::xoroshiro128plus>(42);
return dqrng::sample::sample<Rcpp::IntegerVector, uint32_t>(*rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_128starstar(int m, int n) {
auto rng = dqrng::generator<dqrng::xoroshiro128starstar>(42);
return dqrng::sample::sample<Rcpp::IntegerVector, uint32_t>(*rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_128plusplus(int m, int n) {
auto rng = dqrng::generator<dqrng::xoroshiro128plusplus>(42);
return dqrng::sample::sample<Rcpp::IntegerVector, uint32_t>(*rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_256plus(int m, int n) {
auto rng = dqrng::generator<dqrng::xoshiro256plus>(42);
return dqrng::sample::sample<Rcpp::IntegerVector, uint32_t>(*rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_256starstar(int m, int n) {
auto rng = dqrng::generator<dqrng::xoshiro256starstar>(42);
return dqrng::sample::sample<Rcpp::IntegerVector, uint32_t>(*rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_256plusplus(int m, int n) {
auto rng = dqrng::generator<dqrng::xoshiro256plusplus>(42);
return dqrng::sample::sample<Rcpp::IntegerVector, uint32_t>(*rng, uint32_t(m), uint32_t(n), true, 0);
}
```

```
N <- 1e5
M <- 1e3
bm <- bench::mark(
sample.int(M, N, replace = TRUE),
sample_128plus(M, N),
sample_128starstar(M, N),
sample_128plusplus(M, N),
sample_256plus(M, N),
sample_256starstar(M, N),
sample_256plusplus(M, N),
check = FALSE,
min_time = 1
)
knitr::kable(bm[, 1:6])
```

expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|

sample.int(M, N, replace = TRUE) | 3.25ms | 3.61ms | 269.5642 | 401KB | 2.034447 |

sample_128plus(M, N) | 408.16µs | 633.07µs | 1532.6744 | 393KB | 14.303494 |

sample_128starstar(M, N) | 373.28µs | 556.72µs | 1733.5808 | 393KB | 14.147238 |

sample_128plusplus(M, N) | 424.63µs | 647.42µs | 1516.8105 | 393KB | 12.909025 |

sample_256plus(M, N) | 391.63µs | 575.64µs | 1686.6577 | 393KB | 15.283629 |

sample_256starstar(M, N) | 406.82µs | 595.84µs | 1622.1948 | 393KB | 12.809865 |

sample_256plusplus(M, N) | 372.11µs | 554.85µs | 1760.1333 | 393KB | 14.141986 |

`plot(bm)`

Again nothing really conclusive. All six RNGs are similar and much faster than R’s build in `sample.int`

.

The speed comparisons between these generators is inconclusive to me. The xoroshiro128 seem to be slightly faster than the xoshiro256 variants. So I am leaning towards one of those as the new default while still making the corresponding xoshiro256 variant available in case a longer period or a higher degree of parallelisation is needed. Comparing the `++`

and `**`

variants, I am leaning towards `++`

, but that is not set in stone.

Blackman, David, and Sebastiano Vigna. 2021. “Scrambled Linear Pseudorandom Number Generators.” *ACM Transactions on Mathematical Software* 47 (4): 1–32. https://doi.org/10.1145/3460772.

Kyle Butts provided the implementation for the new `dqrrademacher`

method for drawing Rademacher weights. The Rademacher distribution is equivalent to flipping a fair coin, which can be efficiently implementd by using the raw bit pattern from the RNG directly. See also #50 and #49.

Kyle also suggested a way to support random draws from a multivariate normal distribution by using code from the `mvtnorm`

package, c.f. #46. I didn’t like the idea of mostly duplicating that code within `dqrng`

. Fortunately, Torsten Hothorn (`mvtnorm`

’s author) provided a hook in his code. So now it is possible to supply the source of normally distributed numbers from the outside, i.e. `dqrmvnorm`

is just calling `mvtnorm::rmvnorm`

but requests the usage of `dqrnorm`

. See also #51.

Finally, I have moved the C++ templates that are used for the fast sampling methods to their own header file `dqrng_sample.h`

. This allows using them in parallel computations fixing #26. An example is shown in the parallel vignette.

Originally I had planned to also include support for weighted sampling in this release. This has been requested in #18 and #45 and I had previously had some success with early experiments. Unfortunately the implementation based on these tests had some issues. The performance from the used probabilistic sampling gets really bad, if one (or few) possibilities have much higher weights than the others. To quantify this, one can use `max_weight / average_weight`

, which is a measure for how many tries one needs before a draw is successful.

- This is 1 for un-weighted distribution or weights.
- This is (around) 2 for the random distributions used so far.
- This would be the number of elements in the extreme case where all weight is on one element.

I am not sure yet what a good cut-off point is go for a different algorithm. Or if it would be better to use the alias method right away, c.f. https://www.keithschwarz.com/darts-dice-coins/.

In addition tikzDevice version 0.12.5 made it unto CRAN some time ago, but I forgot to blog about it. This was a rather minor update triggered by a new `WARN`

on CRAN. A recent `memoir.cls`

, used for formatting the vignette, has acquired an incompatibility with the by now ancient `float.sty`

. I decided to just remove the latter and rely more on LaTeX’s standard methods for placing floating environments.

The goal of swephR is to provide an R interface to the Swiss Ephemeris (SE), a high precision ephemeris based upon the DE431 ephemeris from NASA’s JPL. It covers the time range 13201 BCE to 17191 CE.

This version of swephR fixes various `function declaration isn’t a prototype`

warnings that CRAN now counts as important. Basically this means that function declarations like `int foo()`

are illegal and need to be written as `int foo(void)`

. A new SE version v2.10.03 is also used including the following upstream changes:

Version | Date | Comment |
---|---|---|

2.09 | 22-jul-2020 | Improved Placidus houses, sidereal ephemerides, planetary magnitudes; minor bug fixes |

2.10 | 10-dec-2020 | NEW: planetary moons |

2.10.03 | 27-aug-2022 | Update Moon magnitude |

So far planetary moons are **not** supported in swephR. Please let me know if you need this feature.

During the upgrade process I was introduced to the “joys” of having a CRAN package with reverse dependencies. One of my reverse dependencies has tests that broke when the new SE version returned slightly different values for some computations. And somehow my tests had not uncovered that. Thanks to the CRAN team and the package maintainers for their patience and support!

]]>This site uses blogdown with the Hugo Academic theme. I took the following steps to enable share on Mastodon:

Download https://raw.githubusercontent.com/Juerd/tootpick/main/index.html and save it as

`static/tootpick.html`

.Copy

`themes/hugo-academic/layouts/partials/share.html`

to`layouts/partials/share.html`

and add:

```
<li>
<a class="mastodon"
href="/tootpick.html#text={{ .Title | html }}%0A{{ .Permalink | html }}"
target="_blank" rel="noopener">
<i class="fab fa-mastodon"></i>
</a>
</li>
```

The tikzDevice package provides a graphics output device for R that records plots in a LaTeX-friendly format. The device transforms plotting commands issued by R functions into LaTeX code blocks. When included in a paper typeset by LaTeX, these blocks are interpreted with the help of TikZ—a graphics package for TeX and friends written by Till Tantau.

In this release I finally merged PR #206 from Paul Murrell to make `tikzDevice`

compatible with the graphics engine in R >= 4.1. And Dean Scarff made sure that `tikzInfo->outColorFileName`

is always initialized (#200 fixing #199). In addition I have also added the current working directory is to `TEXINPUTS`

(#203 fixing #197 and #198).

The main motivation for this release was a new WARNING on CRAN that could have triggered the removal of the package. (#219 fixing #218) The WARNING was triggered by the last remaining call to the standard C library function `sprintf()`

. This function is insecure due to the possibility of buffer overflows, so it is a good idea to replace it with `snprintf()`

where one needs the specify the number of characters to be printed. Normally this is straight forward, since one only has to call `strlen()`

on all arguments and add the results. But in this case there is an integer number among the arguments, so I opted for taking the logarithm to determine the length:

```
snprintf(tikzInfo->outFileName,
strlen(tikzInfo->originalColorFileName) + floor(log10(tikzInfo->pageNum)) + 1,
tikzInfo->originalFileName,
tikzInfo->pageNum);
```

The issue contains a reference to a blog post that is by now only available via the wayback machine. This blog post shows a stochastic acceptance method suggested by Lipowski and Lipowska (2012) (also at https://arxiv.org/abs/1109.3627), which appears very promising. Let’s try some simple tests before incorporating it into the package properly. The stochastic acceptance algorithm is very simple:

- Select randomly one of the individuals (say, ). The selection is done with uniform probability , which does not depend on the individual’s fitness .
- With probability , where is the maximal fitness in the population, the selection is accepted. Otherwise, the procedure is repeated from step 1 (i.e., in the case of rejection, another selection attempt is made).

This can be implemented as:

```
// [[Rcpp::depends(dqrng,BH,sitmo)]]
#include <Rcpp.h>
#include <dqrng_distribution.h>
auto rng = dqrng::generator<>(42);
// [[Rcpp::export]]
Rcpp::IntegerVector sample_prob(int n, Rcpp::NumericVector prob) {
Rcpp::IntegerVector result(Rcpp::no_init(n));
double max_prob = Rcpp::max(prob);
uint32_t m(prob.length());
std::generate(result.begin(), result.end(),
[m, prob, max_prob] () {
while (true) {
int index = (*rng)(m);
if (dqrng::uniform01((*rng)()) < prob[index] / max_prob)
return index + 1;
}
});
return result;
}
```

First, let’s check that sampling still works as expected:

```
M <- 1e4
N <- 10
prob <- runif(N)
hist(sample.int(N, M, replace = TRUE, prob = prob), breaks = N)
```

`hist(sample_prob(M, prob = prob), breaks = N)`

Eyeballing these histograms shows that they are very similar, i.e. the stochastic acceptance algorithm selects the ten possibilities with the same probabilities as R’s build in method.

Second, let’s look at performance:

```
bm <- bench::mark(
sample.int = sample.int(N, M, replace = TRUE, prob = prob),
sample_prob = sample_prob(M, prob = prob),
check = FALSE
)
knitr::kable(bm[, 1:6])
```

expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|

sample.int | 226µs | 258µs | 3418.650 | 41.6KB | 2.014526 |

sample_prob | 392µs | 410µs | 2298.579 | 46.8KB | 2.014530 |

`plot(bm)`

`Loading required namespace: tidyr`

There is only little difference between R’s build in method and this algorithm for only ten different possibilities. However, any performance advantage of stochastic acceptance should come from scaling, i.e. larger number of possibilities are more interesting:

```
N <- 1e5
prob <- runif(N)
bm <- bench::mark(
sample.int = sample.int(N, M, replace = TRUE, prob = prob),
sample_prob = sample_prob(M, prob = prob),
check = FALSE
)
knitr::kable(bm[, 1:6])
```

expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|

sample.int | 2.75ms | 3.6ms | 273.1216 | 1.19MB | 6.302807 |

sample_prob | 739.42µs | 791.6µs | 1146.1369 | 41.6KB | 0.000000 |

`plot(bm)`

This is very promising! It looks worthwhile including this algorithm into {dqrng}, especially since it can also be used for weighted sampling without replacement.

Lipowski, Adam, and Dorota Lipowska. 2012. “Roulette-Wheel Selection via Stochastic Acceptance.” *Physica A: Statistical Mechanics and Its Applications* 391 (6): 2193–96. https://doi.org/10.1016/j.physa.2011.12.004.

This release contains a breaking change: The initial state of `dqrng`

’s RNG is based on R’s RNG, which used to advance R’s RNG state. The implementation has been changed to preserve R’s RNG state, which is less surprising but can change the outcome of current scripts. (#44 fixing #43)

In addition, the generation of uniform random numbers now takes a short-cut for `min == max`

and throws an error for `min > max`

(#34 fixing #33)

Behind the scenes I have switched from Travis and Appveyor to Github Actions for continuous integration. The switch was pretty smooth using `usethis::use_github_actions()`

,

The goal of swephR is to provide an R interface to the Swiss Ephemeris (SE), a high precision ephemeris based upon the DE431 ephemeris from NASA’s JPL. It covers the time range 13201 BCE to 17191 CE.

This new version comes with two important changes. First, Victor has finished the laborious task of making all functions from SE’s C API available to R. Second, I have added a docker image that is automatically build on each push to `master`

and checks for UBSAN errors using the wch1/r-debug image.

The latter change was triggered by the UBSAN errors present in v0.2.0 and my fear that more problems like this might be contained within the SE code base. This is particularly important once we add more test cases to the package. Currently only about 50% of the code is exposed to automatic testing. Increasing the coverage ratio might reveal more UBSAN issues. Hopefully, they will be caught before submission to CRAN this time.

]]>The tikzDevice package provides a graphics output device for R that records plots in a LaTeX-friendly format. The device transforms plotting commands issued by R functions into LaTeX code blocks. When included in a paper typeset by LaTeX, these blocks are interpreted with the help of TikZ—a graphics package for TeX and friends written by Till Tantau.

This version contains a series of minor updates. My former colleague Nico Bellack contributed two fixes:

- tikzDevice correctly translates the
`lmitre = n`

parameter of the`plot()`

function now (#178) `tikz()`

now accepts both`file`

and`filename`

as named arguments to fix`ggsave`

issue that occurred with ggplot2 v3.0.0 (#181)

Hugo Gruson added syntax highlighting to the README (#194), and Duncan Murdoch spotted and corrected missing double escapes in the help page (#193). I updated the maintainer address and switched to using a temporary working directory due to problems with longer user names on Windows (#192).

Thanks a lot to all contributors!

]]>The question asker wanted to speed up the evaluation of `pmax(x, 0)`

for an integer vector `x`

by using the identity `pmax(x, 0) = (x + abs(x)) / 2`

and going to C++ with the help of Rcpp:

```
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector do_pmax0_abs_int(IntegerVector x) {
R_xlen_t n = x.length();
IntegerVector out(clone(x));
for (R_xlen_t i = 0; i < n; ++i) {
int oi = out[i];
out[i] += abs(oi); // integer overflow possible!
out[i] /= 2;
}
return out;
}
```

The issue with this initial solution is the potential for an integer overflow for values larger than half the maximum size of a 32bit integer, i.e. , resulting in negative results:

```
set.seed(42)
ints <- as.integer(runif(1e6, -.Machine$integer.max, .Machine$integer.max))
head(ints)
```

`[1] 1781578390 1877224605 -918523703 1419261746 608792367 82016476`

`do_pmax0_abs_int(head(ints))`

`[1] -365905258 -270259043 0 -728221902 608792367 82016476`

So the original question was how to “quickly determine the approximate maximum of an integer vector”, in particular whether or not the maximum of the integers is not larger than 1,073,741,824.

Reading this question I realized that finding this approximate maximum would require scanning the vector linearly with the potential for an early exit. In the worst case we would compare every vector element with some fixed number, which is exactly what we wanted to avoid in the first place! In addition, it was unclear to me how one could make use of this approximate maximum. On the other hand, it is quite easy to fix the potential integer overflow by using a larger integer type for intermediate results:

```
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
IntegerVector do_pmax0_abs_int64(IntegerVector x) {
R_xlen_t n = x.length();
IntegerVector out = no_init(n);
for (R_xlen_t i = 0; i < n; ++i) {
int64_t oi = x[i];
oi += std::abs(oi);
out[i] = static_cast<int>(oi / 2);
}
return out;
}
```

Since this version skips the initialization of the output vector, it is faster than the original solution and works correctly for for large integers:

expression | min | median | itr/sec | mem_alloc |
---|---|---|---|---|

pmax(ints, 0) | 10.16ms | 16.07ms | 65.68012 | 15.34MB |

do_pmax0_abs_int(ints) | 1.8ms | 3.54ms | 269.19683 | 3.82MB |

do_pmax0_abs_int64(ints) | 1.23ms | 2.98ms | 344.47693 | 3.82MB |

At this point I stopped and posted my answer, even though I should have looked at at least two other things. First, why does `pmax`

uses so much more memory? The reason is a classical error when working with R: `0`

is not an integer but a numeric. To compare with the integer one should use `0L`

. This reduces memory consumption and run time, but the latter is still larger than for the C++ solutions.

Second and more importantly, is the mathematical trick from the beginning really needed? Can one not simply traverse the vector and take the maximum for each element compared to zero?

```
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
IntegerVector do_pmax0_int(IntegerVector x) {
IntegerVector out = no_init(x.length());
std::transform(x.cbegin(), x.cend(), out.begin(),
[](int y){return std::max(y, 0);});
return out;
}
```

It turns out that this is actually the fastest method:

expression | min | median | itr/sec | mem_alloc |
---|---|---|---|---|

do_pmax0_int(ints) | 846.55µs | 2.39ms | 480.4942 | 3.82MB |

do_pmax0_abs_int64(ints) | 1.23ms | 2.92ms | 395.5972 | 3.82MB |

While avoiding the first level XY problem in the question, I initially did not notice the second level XY problem.

]]>```
// [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(RcppNumerical)]]
#include <RcppNumerical.h>
namespace rstub {
// [...]
}
class exp4: public Numer::Func {
private:
double mean;
public:
exp4(double mean_) : mean(mean_) {}
double operator()(const double& x) const {
return exp(-pow(x - mean, 4) / 2);
}
};
// [[Rcpp::export]]
Rcpp::NumericVector integrate_exp4(const double &mean, double lower, double upper) {
exp4 function(mean);
double err_est;
int err_code;
double result = rstub::integrate(function, lower, upper, err_est, err_code);
return Rcpp::NumericVector::create(Rcpp::Named("result") = result,
Rcpp::Named("error") = err_est);
}
```

and have it correctly handle different input:

```
rbind(
integrate_exp4(4, 0, 4),
integrate_exp4(4, -Inf, Inf),
integrate_exp4(4, 3, Inf),
integrate_exp4(4, -Inf, 3)
)
```

```
result error
[1,] 1.0779003 9.252237e-08
[2,] 2.1558005 1.439771e-06
[3,] 1.9903282 4.250105e-11
[4,] 0.1654723 6.251315e-14
```

The only differences in the above code to the sample code from the previous post is the usage or `rstub::integrate`

instead of `Numer:integrate`

and the as yet unspecified `rstub`

namespace. What is needed in that namespace? First, we will need a template class that does the necessary variable substitutions. In the case where both limits are infinite, we use as before resulting in

If only one of the limits is infinite, we use the substitutions and resulting in

and

For the C++ template class aggregation is used instead of inheritance, allowing to easily specify the limits:

```
template<class T>
class transform_infinite: public Numer::Func {
private:
T func;
double lower;
double upper;
public:
transform_infinite(T _func, double _lower, double _upper) :
func(_func), lower(_lower), upper(_upper) {}
double operator() (const double& t) const {
double x = (1 - t) / t;
bool upper_finite = (upper < std::numeric_limits<double>::infinity());
bool lower_finite = (lower > -std::numeric_limits<double>::infinity());
if (upper_finite && lower_finite) {
Rcpp::stop("At least on limit must be infinite.");
} else if (lower_finite) {
return func(lower + x) / pow(t, 2);
} else if (upper_finite) {
return func(upper - x) / pow(t, 2);
} else {
return (func(x) + func(-x)) / pow(t, 2);
}
}
};
```

Finally we need a wrapper function for `Numer::integrate`

which checks if both limits are finite or not:

```
using Numer::Integrator;
template<class T>
double integrate(const T& f, double lower, double upper,
double& err_est, int& err_code,
const int subdiv = 100, const double& eps_abs = 1e-8, const double& eps_rel = 1e-6,
const Integrator<double>::QuadratureRule rule = Integrator<double>::GaussKronrod41) {
if (upper == lower) {
err_est = 0.0;
err_code = 0;
return 0.0;
}
if (std::abs(upper) < std::numeric_limits<double>::infinity() &&
std::abs(lower) < std::numeric_limits<double>::infinity()) {
return Numer::integrate(f, lower, upper, err_est, err_code, subdiv, eps_abs, eps_rel, rule);
} else {
double sign = 1.0;
if (upper < lower) {
std::swap(upper, lower);
sign = -1.0;
}
transform_infinite<T> g(f, lower, upper);
return sign * Numer::integrate(g, 0.0, 1.0, err_est, err_code, subdiv, eps_abs, eps_rel, rule);
}
}
```

If both limits are finite, `Numer::integrate`

is used directly. Otherwise the function is transformed and `Numer::integrate`

is used with adjusted range. In addition, it is first checked that the upper limit is actually larger than the lower limit. If this is not the case, one of the properties of integration is used to swap the limits and change the sign:

Thereby we get the correct result even when the limits have been exchanged:

```
rbind(
integrate_exp4(4, 3, Inf),
integrate_exp4(4, Inf, 3)
)
```

```
result error
[1,] 1.990328 4.250105e-11
[2,] -1.990328 4.250105e-11
```

In the end we needed on template class and one template function, which could be put into a separate header file, to generalize `Numer::integrate`

for integration over an infinite interval.

The goal of swephR is to provide an R interface to the Swiss Ephemeris, a high precision ephemeris based upon the DE431 ephemeris from NASA’s JPL. It covers the time range 13201 BCE to 17191 CE.

This new version comes closely after last week’s release and contains only a single albeit important fix to a stack overflow write found by the UBSAN tests done on CRAN. Sadly I did not find this error using `rhub::check_with_sanitizers()`

before uploading to CRAN. I will analyze this further before the next upload, since I fear that other issues like this might surface as we expose more of the Swiss Ephemeris to R.