Monday, November 25, 2013

Efficiency in R

One common criticism on R is on its performance. Yes, compare with some other programming languages, R is not designed to optimize for performance. But there are still lots of tricks (and pitfalls) in R that we can pay attention to improve its performance.

When an object is passed into a function, it is not automatically copied to a new object, unless some modification of this object, or implicit declaration of a derivative object is needed inside the function. Therefore, please avoid unnecessary modification of a large object, or try to construct another large object based on it, since some seems-to-be-minor operation may actually trigger a lot of copy load. See example:

a <- rnorm(1e+09) # make a large object
fun1 <- function(vec) {
  mean(vec) + 1
}

fun2 <- function(vec) {
  mean(vec + 1)
}

> system.time(fun1(a))
   user  system elapsed 
  9.594  32.137 123.823 
> system.time(fun2(a))
   user  system elapsed 
 12.039  44.128 203.475 

The first function takes vec and scan it to calculate the mean. The second function first construct a new object vec + 1, and then pass that into the mean() function. When the vec object is large, computing vec + 1 is expensive. 

No comments:

Post a Comment