[Solved] Memory issues in-memory dataframe. Best approach to write output?


I have been using the following snippet:

library(data.table)
filepaths <- list.files(dir)
resultFilename <- "/path/to/resultFile.txt"
for (i in 1:length(filepaths)) {
  content <- fread(filepaths, header = FALSE, sep = ",")
  ### some manipulation for the content 
  results <- content[1]
  fwrite(results, resultFilename, col.names = FALSE, quote = FALSE, append = TRUE)
}
finalData <- fread(resultFilename, header = FALSE, sep = ",")

In my use case, for ~2000 files and tens of millions of rows the processing time decreased over 95 % compared with read.csv and incrementally increasing data into a data.frame in the loop. As you can see in https://csgillespie.github.io/efficientR/importing-data.html section 4.3.1 and https://www.r-bloggers.com/fast-csv-writing-for-r/, fread and fwrite are very affordable data I/O functions.

1

solved Memory issues in-memory dataframe. Best approach to write output?