How to convert list of data frames into one data frame

Tags:

Problem to discuss is how to convert the following list of data frames into one data frame.

x <- list(data.frame(math=90, science=85),
          data.frame(math=98, science=82))

One commonly known approach is to convert the list to vector using unlist(), and then convert to matrix, and then finally make it as data frame.

> x <- list(data.frame(math=90, science=85),
+           data.frame(math=98, science=82))
> 
> as.data.frame(matrix(ncol=2, byrow=TRUE, unlist(x)))
  V1 V2
1 90 85
2 98 82

But this has drawback as unlist() will perform type-coercion as vector can contain only one data type. For example, we get weird names in the example below.

> x <- list(data.frame(name="foo", value=1),
+           data.frame(name="bar", value=2))
> 
> as.data.frame(matrix(ncol=2, byrow=TRUE, unlist(x)))
  V1 V2
1  1  1
2  1  2

Solution for this is to use do.call() with rbind(). It calls rbind for each elements in the list.

> x <- list(data.frame(name="foo", value=1),
+           data.frame(name="bar", value=2))
> 
> do.call(rbind, x)
  name value
1  foo     1
2  bar     2

But the issue is that the rbind is slow.

> x <- lapply(1:10000, function(x) {
+   data.frame(name=paste(x, "foo"), value=x)
+ })
> head(x)
[[1]]
   name value
1 1 foo     1

[[2]]
   name value
1 2 foo     2

[[3]]
   name value
1 3 foo     3

[[4]]
   name value
1 4 foo     4

[[5]]
   name value
1 5 foo     5

[[6]]
   name value
1 6 foo     6

> system.time(do.call(rbind, x))
   user  system elapsed 
 10.100   0.014  10.114 

Ten seconds is not bad, right? But it can be several minutes as data frame gets more columns in it.

rbindlist in data.table solves this problem as in the below. See that it takes almost negligible amount of time.

> library(data.table)
> x <- lapply(1:10000, function(x) {
+   data.frame(name=paste(x, "foo"), value=x)
+ })
> system.time(rbindlist(x))
   user  system elapsed 
  0.003   0.000   0.003 

data.table is subclass of data.frame. Thus, it should work where data.frame is necessary. And, if needed, data.table can be easily converted to data.frame using as.data.frame().