Fastest way to manipulate bigdata in R

Issue

This Content is from Stack Overflow. Question asked by Cindy Burker

I have 40+ CSV files with each being around 400MB. What I need to do is to read these 40+ big csv files, do some manipulation and formatting on them (such as commonize date formats, separating dates to months,day,etc..), and combine them in a single data frame. I have searched in the previous post about the quickest way to read these CSV files to be “fread” but even when I used fread, it took approx. 14 seconds for reading each file, and leaves me with a pretty significant runtime. I tried using SQLite through RSQLite for a single csv files:

setwd("raw_data/sqldatabase")
db <- dbConnect(SQLite(), dbname="test_db.sqlite") ## will make, if not present
dbWriteTable(conn=db, name="your_table", value="testdata.csv", row.names=FALSE, header=TRUE)

However, even using SQLite it took a considerable amount of time. What can be used to quickly read 40+ big csv folders into a “space” that makes manipulation on is very fast?

If I were to upload the data to a database once, and if it were to make the manipulation very fast from than on, I would be still fine, but the final folder (once merge is complete) expected to be 25+GB. So I am trying to find the most efficient way to manipulate the data



Solution

This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.

This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.

people found this article helpful. What about you?