For most people who are new to R, the first thing they do (unless they do a "Hello, World!" exercise) is to assignment some value to an object, such as
a <- 1; b <- a
It is obvious here that we assign a value 1, and then assign b as the same value as a, which is also 1. But, it is an interesting to think about how exactly does R execute b <- a. Actually, R doesn't really make a clone copy of a and assign to b. Instead, R simply lets b point to an object a, without making a physical copy. In other words, both a and b point to the same physical allocation on the disk. But what will happen when we change the value of one object? Yes, as we will probably guess, at that time R makes a new copy off the original one.
To illustrate this example, we make a (relatively) large object and check R's memory usage to confirm whether multiple copies of that large object is made.
> rm(list = ls()) # remove all existing objects to clean memory
> a <- rnorm(1e+08); gc() # this is a large object
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 282199 15.1 531268 28.4 467875 25.0
Vcells 100603178 767.6 210878317 1608.9 200764441 1531.8
> b1 <- a; gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 282203 15.1 531268 28.4 467875 25.0
Vcells 100603179 767.6 210878317 1608.9 200764441 1531.8
We see that memory usage is not increased
> b2 <- b1; gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 282208 15.1 531268 28.4 467875 25.0
Vcells 100603180 767.6 210878317 1608.9 200764441 1531.8
> b3 <- b2; gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 282213 15.1 531268 28.4 467875 25.0
Vcells 100603181 767.6 210878317 1608.9 200764441 1531.8
Now, we have a, b1, b2, b3 both point the same object, with only 1 real memory allocation.
> a[1] <- 1
> a[1] <- 1; gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 282217 15.1 531268 28.4 467875 25.0
Vcells 200603182 1530.5 315878475 2410.0 300763181 2294.7
We see that as we changed a, a new copy of comparable size is created, but b1, b2, b3 are still sharing one physical allocation
> b1[1] <- 1; gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 282218 15.1 531268 28.4 467875 25.0
Vcells 300603182 2293.5 331752398 2531.1 300923553 2295.9
> b2[1] <- 1; gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 282219 15.1 531268 28.4 467875 25.0
Vcells 400603182 3056.4 442002427 3372.3 400763207 3057.6
Finally, a, b1, b2, b3 now point to 4 physical memory allocation for four different objects.
No comments:
Post a Comment