Home » R » Reading .xlsx files into R, quickly

# Reading .xlsx files into R, quickly

The gdata library is fantastic for reading small .xls or .xlsx files into R, but it can be very slow for large files. Today I had to read data from the second sheet of a 64Mb .xlsx file. It should have gone something like

```library(gdata)
```

but R just hung there, unresponsive, for minutes.

Instead, I followed advice on coverting .xls to .csv on the linux command line, and used gnumeric. I installed via my package manager

```sudo apt-get install gnumeric
```

then, back in R, did

```convert.xls <- function(f,out.dir=dirname(f)) {
f2 <- paste(out.dir, gsub(".xlsx$|.xls$",".csv",basename(f)), sep="/")
system(paste("ssconvert -S --export-type=Gnumeric_stf:stf_csv",f,f2))
out.files <- list.files(dirname(f2),pattern=basename(f2),full.names=TRUE)
cat("infile:",f,"\noutfile(s):",paste(out.files,collapse="\n\t"),"\n")
}
convert.xls(f)
```

which reported

```infile: myfile.xlsx
outfile(s): myfile.csv.0
myfile.csv.1
myfile.csv.2
myfile.csv.3
```

so that I could read my file with

```# read sheet 2, indexing is 0-based
mydata <- read.table("myfile.csv.1", header = TRUE, as.is=TRUE, sep=",")
```

It was quick, it worked. I love ssconvert! The persistence of journals in distributing large supplementary data files in excel format remains annoying, but perhaps ever so slightly less so.

In the function, I assumed by default that the converted files should be placed in the same directory as the input file. If that’s not the case, just supply the intended directory using the argument `out.dir`.