Saturday, November 23, 2013

Simulation speed: GNU R vs Julia

Recently there is a lot of noise about Julia. I have decided to test its speed in simulation tasks on my toy Cont model. I thought I had vectorized my GNU R code pretty well, but Julia is much faster.

The model was described in my earlier posts so let us go down to a comparison:
Here is my GNU R code:

library(e1071)

cont.run <- function(burn.in, reps, n, d, l, s) {
  tr <- rep(0, n)
  r <- rep(0, reps)
  for (i in 1:reps) {
    sig <- rnorm(1, 0, d)
    mul <- 1
    if (sig < 0) {
        sig <- -sig
        mul <- -1
    }
    r[i] <- mul * sum(sig > tr) / (l * n)
    tr[runif(n) < s] <- abs(r[i])
  }
  return(kurtosis(r[burn.in:reps]))
}

system.time(replicate(10,
    cont.run(1000, 10000, 1000, 0.005, 10.0, 0.01)))

It's execution time is a bit below 10 seconds on my laptop.

An equivalent Julia code is the following:

using Distributions

function cont_run(burn_in, reps, n, d, l, s)
    tr = Array(Float64, n)
    r = Array(Float64, reps)
    for i in 1:reps
        aris = 0
        sig = randn() * d
        mul = 1
        if sig < 0
            sig = -sig
            mul = -1
        end
        for k in 1:n
            if sig > tr[k]
                aris += 1
            end
        end
        ari = aris / (l * n)
        r[i] = mul * ari
        for j in 1:n
            if rand() < s
                tr[j] = ari
            end
        end
    end
    kurtosis(r[burn_in:reps])
end

n = 10
t_start = time()
k = Array(Float64, n)
for i in 1:n
    k[i] = cont_run(1000, 10000, 1000, 0.005, 10.0, 0.01)
end
println(time() - t_start)

And on my machine it takes a bit less than 0.7 seconds to run.

So we get over tenfold speedup. This is a significant difference for simulation experiments.
I will have to dig more into Julia in the future.

9 comments:

  1. Few other things to watch:

    https://news.ycombinator.com/item?id=6745135

    http://radfordneal.github.io/pqR/

    http://www.renjin.org/

    https://github.com/jtalbot/riposte

    Plus Revolution, TIBCO, and Tableau

    some of which look like giving a similar speedup without the more verbose Julia code, and the somewhat lonely Julia community.

    ReplyDelete
  2. I don't think we need to use a different version R here. Fragmentation doesn't help and most of the improvement achieved through pqR will be ported to R very soon.

    Now, many R users have Julia installed in their machine and some are even core developper of the language, I think the Julia community is great with very smart guys and the day I'll want to switch I'll probably use Julia.

    Now comming back to this problem, one reason why many useRs stick with R despite this kind of performance is because of the rich library composed of more than 5000 packages and one of them is Rcpp. So learning Julia is good but learning Cpp and Rcpp is better investment for a useRs.

    Translating your Julia code to Cpp is straightforward

    Cpp side :

    #include

    using namespace Rcpp;

    // [[Rcpp::export]]
    NumericVector cont_runcpp(int reps, int n, double d, double l, double s) {
    NumericVector tr(n);
    NumericVector r(reps);
    for (int i = 0; i < reps; i++) {
    double aris = 0.0;
    double sig = R::rnorm(0, d);
    int mul = 1;
    if (sig < 0) {
    sig = -sig;
    mul = -1;
    }
    for (int k=0; k < n; k++) {
    if (sig > tr(k)) {
    aris += 1;
    }
    }
    double ari = aris / (l * n);
    r(i) = mul * ari;
    for (int j=0; j < n; j++) {
    if (R::runif(0, 1) < s) {
    tr(j) = ari;
    }
    }
    }
    return r;
    }

    Let's put this code into a file names "cont_runcpp.cpp"

    R side :

    require(Rcpp)
    require(e1071)
    require(rbenchmark)
    sourceCpp("cont_run.cpp")

    cont_runRcpp <- function(burn_in = 1000, reps, n, d, l, s)
    kurtosis(cont_runcpp(reps, n, d, l, s)[burn_in:reps])

    R <- function() cont.run(1000, 10000, 1000, 0.005, 10.0, 0.01)
    Rcpp <- function() cont_runRcpp(1000, 10000, 1000, 0.005, 10.0, 0.01)

    benchmark(R(),
    Rcpp(),
    columns = c("test", "elapsed", "relative", "replications"),
    replications = 100)

    test elapsed relative replications
    1 R() 81.950 3.944 100
    2 Rcpp() 20.778 1.000 100

    This cpp can be improved I just took you julia code as is.


    ReplyDelete
  3. There may be a way to improvement performance by eliminating R's for loop with vectorization

    ReplyDelete
    Replies
    1. I was thinking about it - but found no way.
      Do you have any idea how to proceed?

      Delete
    2. The line:

      tr[runif(n) < s] <- abs(r[i])

      makes it difficult.

      Delete
  4. I found this port on r-bloggers and all I can say is...blasphemer !

    ReplyDelete
  5. That is a pretty awesome improvement in performance! Sorry to be a pedant (this does not affect the performance comparison), but:

    the R code:

    r[i] <- mul * sum(sig > tr) / (l * n)
    tr[runif(n) < s] <- abs(r[i])


    is not equivalent to the Julia code:

    ari = aris / (l * n)
    r[i] = mul * ari
    for j in 1:n
    if rand() < s
    tr[j] = ari
    end
    end


    as you don't multiply tr[j] by 'mul' or take the absolute value.

    ReplyDelete
    Replies
    1. It is equivalent.
      abs(r[i]) = abs(mul * ari) = abs(mul) * abs(ari) = ari
      as ari is always positive ant mul is either 1 or -1.

      Delete
  6. Please note that on modern multicore-CPU machines it is possible to boost R performance with “multicore” package. You may have noticed that R runs single-core, even on such modern machines. This in big part explains the dramatic difference in performance against Julia which has been designed for parallel processing and (I assume) makes use of all available cores on a host.
    Package “multicore” enables R to utilize multiple cores in a straightforward manner. On my 8-core CPU machine I got over 5x speedup for the above cont model:
    > REPS<-24
    > system.time(lapply(rep(1000,REPS),cont.run, 10000, 1000, 0.005, 10.0, 0.01))
    user system elapsed
    19.054 0.000 19.127
    > library(multicore)
    > system.time(mclapply(rep(1000,REPS),cont.run, 10000, 1000, 0.005, 10.0, 0.01))
    user system elapsed
    21.535 0.172 3.669

    As you can see the trick was to replace a call to “lapply” with a call to “mclapply” - its parallelized version provided by “multicore” package. Going further you may parallelize large chunks of existing code in a transparent, nearly effortless way by redefining basic functions such as “sapply” and “replicate” to use “mclapply” instead of “lapply”.
    You can also gain some additional seconds by enabling implicit (background) JIT compilation in R:

    > library(compiler)
    > enableJIT(3)
    > system.time(lapply(rep(1000,REPS),cont.run, 10000, 1000, 0.005, 10.0, 0.01))
    user system elapsed
    17.417 0.009 17.512
    > system.time(mclapply(rep(1000,REPS),cont.run, 10000, 1000, 0.005, 10.0, 0.01))
    user system elapsed
    26.099 0.308 3.364

    ReplyDelete