## Saturday, March 16, 2013

### GNU R loop speed comparison

Recently I had several discussions about using for loops in GNU R and how they compare to *apply family in terms of speed. I have not seen a direct benchmark comparing them so I decided to execute one (warning: some of the code presented today takes long time to execute).

First I have started by comparing the speed of assignment operator for lists vs. numeric vectors in standard for loops. Here is the code:
```speed.test <- function(n) {
gc()
x1 <- numeric(n)
x2 <- vector(n, mode = "list")
c(system.time(for (i in 1:n) { x1[i] <- i })[3],
system.time(for (i in 1:n) { x2[[i]] <- i })[3])
}

n <- seq(10 ^ 4, 10 ^ 6, len = 5)
result <- t(sapply(n, speed.test))
par(mar=c(4.5, 4.5, 1, 1))
matplot(n / 1000, result, type = "l" , col = 1:2, lty = 1,
xlab = "n ('000)", ylab = "time")
legend("topleft", legend = c("numeric", "list"),
col = 1:2, lty = 1)```
The picture showing the result of the comparison is the following:
As we can see - operation numeric vectors are significantly faster than list, especially for large vector sizes.

But how does this relate to *apply family of functions? The issue is that the workhorse function there is lapply and it works on lists. Other functions from this family call lapply internally.

So I have run the second test comparing: (a) lapply, (b) for loop working on lists and (c) for loop working on numeric vectors. Here is the code:
```aworker <- function(n) {
r <- lapply(1:n, identity)
return(NULL)
}

lworker <- function(n) {
r <- vector(n, mode = "list")
for (i in 1:n) {
r[[i]] <- identity(i)
}
return(NULL)
}

nworker <- function(n) {
r <- numeric(n)
for (i in 1:n) {
r[i] <- identity(i)
}
return(NULL)
}

run <- function(n, worker) {
gc()
unname(system.time(worker(n))[3])
}

compare <- function(n) {
c(lapply = run(n, aworker),
list = run(n, lworker),
numeric = run(n, nworker))
}

n <- rep(c(10 ^ 6, 10 ^ 7), 10)
result <- t(sapply(n, compare))
par(mfrow = c(1,2), mar = c(3,3,3,1))
for (i in n[1:2]) {
boxplot(result[n == i,],
main = format(i, scientific = F, big.mark = ","))
}```
On the picture below we can see the result. For 1,000,000 elements in a vector lapply is the fastest. The reason it that it executes looping in compiled C code. However for 10,000,000 elements for loop using numeric vector is faster as it avoids conversion to list.
Of course probably on other machines than my notebook the difference in speed would manifest itself for other number of elements in a vector.
However one can draw a general conclusion: if you have large AND numeric vectors and need to do a lot of number crunching for loop will be faster than lapply.

1. You might want take a look at http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar/ , where there are a variety of useful benchmarks/discussion ...

2. I would also mention http://stackoverflow.com/questions/2908822/speed-up-the-loop-operation-in-r and http://stackoverflow.com/questions/5533246/why-is-apply-method-slower-than-a-for-loop-in-r

Package rbenchmark has some nice things for comparing benchmarks.

3. The switch in relative performance between 1M and 10M elements is very suspicious.

Perhaps the different sizes caused a change in memory subsystem behavior, e.g., the garbage collector is busier with 10M, or more disk swapping occurred (unlikely unless lots of other stuff was already in memory), or something caused the cache to start thrashing (always difficult to predict).

4. You should also run these tests using the compiler package. I suspect that nworker will run a lot faster.

5. Thanks for the comments. I have thought of Rcpp, but discarded it because I wanted to use pure R code. However, I have forgotten of compiler package.

I have run the test compiling all three worker routines and here are the results (average time from 10 runs):

1M:
lapply list numeric
0.879 0.862 0.795

10M:
lapply list numeric
39.721 22.875 7.614

So - as predicted by Bernard - for loops get a big performance boost. If you would get different results - it would be interesting to see them.

6. The *apply-family are just wrappers around for-loops, so they will not be faster than regular loops; they are however very convenient when looping over certain data structures (just type 'apply' in the console to see the code). The lapply does call an internal function to process the loop but that hardly makes it faster since the R-code in the FUN argument needs to be reinterpreted at every step.

1. This is true, but in lapply for loop is implemented directly in C not in R.

7. Intel i7-860 with 8GB RAM.

1M:
lapply list numeric
0.887 2.549 2.218

10M:
lapply list numeric
26.958 36.977 22.269