## Saturday, November 23, 2013

### Simulation speed: GNU R vs Julia

Recently there is a lot of noise about Julia. I have decided to test its speed in simulation tasks on my toy Cont model. I thought I had vectorized my GNU R code pretty well, but Julia is much faster.

The model was described in my earlier posts so let us go down to a comparison:
Here is my GNU R code:

library(e1071)

cont.run <- function(burn.in, reps, n, d, l, s) {
tr <- rep(0, n)
r <- rep(0, reps)
for (i in 1:reps) {
sig <- rnorm(1, 0, d)
mul <- 1
if (sig < 0) {
sig <- -sig
mul <- -1
}
r[i] <- mul * sum(sig > tr) / (l * n)
tr[runif(n) < s] <- abs(r[i])
}
return(kurtosis(r[burn.in:reps]))
}

system.time(replicate(10,
cont.run(1000, 10000, 1000, 0.005, 10.0, 0.01)))

It's execution time is a bit below 10 seconds on my laptop.

An equivalent Julia code is the following:

using Distributions

function cont_run(burn_in, reps, n, d, l, s)
tr = Array(Float64, n)
r = Array(Float64, reps)
for i in 1:reps
aris = 0
sig = randn() * d
mul = 1
if sig < 0
sig = -sig
mul = -1
end
for k in 1:n
if sig > tr[k]
aris += 1
end
end
ari = aris / (l * n)
r[i] = mul * ari
for j in 1:n
if rand() < s
tr[j] = ari
end
end
end
kurtosis(r[burn_in:reps])
end

n = 10
t_start = time()
k = Array(Float64, n)
for i in 1:n
k[i] = cont_run(1000, 10000, 1000, 0.005, 10.0, 0.01)
end
println(time() - t_start)

And on my machine it takes a bit less than 0.7 seconds to run.

So we get over tenfold speedup. This is a significant difference for simulation experiments.
I will have to dig more into Julia in the future.

1. Few other things to watch:

https://news.ycombinator.com/item?id=6745135

http://www.renjin.org/

https://github.com/jtalbot/riposte

Plus Revolution, TIBCO, and Tableau

some of which look like giving a similar speedup without the more verbose Julia code, and the somewhat lonely Julia community.

2. I don't think we need to use a different version R here. Fragmentation doesn't help and most of the improvement achieved through pqR will be ported to R very soon.

Now, many R users have Julia installed in their machine and some are even core developper of the language, I think the Julia community is great with very smart guys and the day I'll want to switch I'll probably use Julia.

Now comming back to this problem, one reason why many useRs stick with R despite this kind of performance is because of the rich library composed of more than 5000 packages and one of them is Rcpp. So learning Julia is good but learning Cpp and Rcpp is better investment for a useRs.

Translating your Julia code to Cpp is straightforward

Cpp side :

#include

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector cont_runcpp(int reps, int n, double d, double l, double s) {
NumericVector tr(n);
NumericVector r(reps);
for (int i = 0; i < reps; i++) {
double aris = 0.0;
double sig = R::rnorm(0, d);
int mul = 1;
if (sig < 0) {
sig = -sig;
mul = -1;
}
for (int k=0; k < n; k++) {
if (sig > tr(k)) {
aris += 1;
}
}
double ari = aris / (l * n);
r(i) = mul * ari;
for (int j=0; j < n; j++) {
if (R::runif(0, 1) < s) {
tr(j) = ari;
}
}
}
return r;
}

Let's put this code into a file names "cont_runcpp.cpp"

R side :

require(Rcpp)
require(e1071)
require(rbenchmark)
sourceCpp("cont_run.cpp")

cont_runRcpp <- function(burn_in = 1000, reps, n, d, l, s)
kurtosis(cont_runcpp(reps, n, d, l, s)[burn_in:reps])

R <- function() cont.run(1000, 10000, 1000, 0.005, 10.0, 0.01)
Rcpp <- function() cont_runRcpp(1000, 10000, 1000, 0.005, 10.0, 0.01)

benchmark(R(),
Rcpp(),
columns = c("test", "elapsed", "relative", "replications"),
replications = 100)

test elapsed relative replications
1 R() 81.950 3.944 100
2 Rcpp() 20.778 1.000 100

This cpp can be improved I just took you julia code as is.

3. There may be a way to improvement performance by eliminating R's for loop with vectorization

1. I was thinking about it - but found no way.
Do you have any idea how to proceed?

2. The line:

tr[runif(n) < s] <- abs(r[i])

makes it difficult.

4. I found this port on r-bloggers and all I can say is...blasphemer !

5. That is a pretty awesome improvement in performance! Sorry to be a pedant (this does not affect the performance comparison), but:

the R code:

r[i] <- mul * sum(sig > tr) / (l * n)
tr[runif(n) < s] <- abs(r[i])

is not equivalent to the Julia code:

ari = aris / (l * n)
r[i] = mul * ari
for j in 1:n
if rand() < s
tr[j] = ari
end
end

as you don't multiply tr[j] by 'mul' or take the absolute value.

1. It is equivalent.
abs(r[i]) = abs(mul * ari) = abs(mul) * abs(ari) = ari
as ari is always positive ant mul is either 1 or -1.

6. Please note that on modern multicore-CPU machines it is possible to boost R performance with “multicore” package. You may have noticed that R runs single-core, even on such modern machines. This in big part explains the dramatic difference in performance against Julia which has been designed for parallel processing and (I assume) makes use of all available cores on a host.
Package “multicore” enables R to utilize multiple cores in a straightforward manner. On my 8-core CPU machine I got over 5x speedup for the above cont model:
> REPS<-24
> system.time(lapply(rep(1000,REPS),cont.run, 10000, 1000, 0.005, 10.0, 0.01))
user system elapsed
19.054 0.000 19.127
> library(multicore)
> system.time(mclapply(rep(1000,REPS),cont.run, 10000, 1000, 0.005, 10.0, 0.01))
user system elapsed
21.535 0.172 3.669

As you can see the trick was to replace a call to “lapply” with a call to “mclapply” - its parallelized version provided by “multicore” package. Going further you may parallelize large chunks of existing code in a transparent, nearly effortless way by redefining basic functions such as “sapply” and “replicate” to use “mclapply” instead of “lapply”.
You can also gain some additional seconds by enabling implicit (background) JIT compilation in R:

> library(compiler)
> enableJIT(3)
> system.time(lapply(rep(1000,REPS),cont.run, 10000, 1000, 0.005, 10.0, 0.01))
user system elapsed
17.417 0.009 17.512
> system.time(mclapply(rep(1000,REPS),cont.run, 10000, 1000, 0.005, 10.0, 0.01))
user system elapsed
26.099 0.308 3.364