data.table Archives

[Solved] Naming columns in a data table in R

January 31, 2023 by Kirat

I don’t word with data tables, but this is a solution that would work for data frames, and should hopefully generalize. The strategy is to use the fact that you can fill one vector with another vector, without ever having to use a loop. # make the example data sets D1 <- as.data.frame(matrix(data=(1:(20*181)), nrow=20, ncol=181)) … Read more

[Solved] Combine data table row elements into a new column as a vector [closed]

December 25, 2022 by Kirat

I am not sure if you want something like below DT[,result := asplit(DT,1)] such that > DT col1 col2 col3 result 1: 1 a x 1,a,x 2: 1 b y 1,b,y 3: 1 c z 1,c,z 1 solved Combine data table row elements into a new column as a vector [closed]

[Solved] subsetting columns in a datatable [duplicate]

November 16, 2022 by Kirat

We need to use with = FALSE dt[, 1:2, with = FALSE] This is explained in the ?data.table with: By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names, a numeric vector of column positions to … Read more

[Solved] How to get quick summary in data.table with a look-back window?

November 2, 2022 by Kirat

I think I understand your request. You seem to care about the order of the observations regardless if, for instance, the second observations Time is prior to the first observations Time. That doesn’t make much sense, but here is a quit efficient data.table solution in order to achieve this. This is basically does a non-equi … Read more

[Solved] What is the substitute idiom for sapply in data.table? [duplicate]

October 30, 2022 by Kirat

Warning, a very large table will be created! dt <- as.data.table(matrix(runif(1000*1000000),ncol=1000)) dt[,lapply(.SD,max)] 14 solved What is the substitute idiom for sapply in data.table? [duplicate]

[Solved] Matching Data Tables by five columns to change a value in another column

October 7, 2022 by Kirat

In R it is always preferable to avoid loops wherever possible, as they are usually much slower than alternative vectorized solutions. This operation can be done with a data.table join. Basically, when you run dt1[dt2]; you are performing a right-join between the two data.tables. The preset key columns of dt1 determine which columns to join … Read more

[Solved] How adjust code functionality to specifications using data.table function

October 2, 2022 by Kirat

With data.table, we can specify the .SDcols to select the ‘DR’ columns or ‘date_cols’ and assign back the output to those, then instead of using rowwise matching, use a row/column indexing to extract the values to create the ‘Result’ library(data.table) # get the column names that starts with DR dr_names <- grep(“^DR”, names(df1), value = … Read more

[Solved] I want to summarize by a column and then have it take the sum of 1 column and the mean of another column

August 28, 2022 by Kirat

The crucial point in OP’s approach is the staggered aggregation (see the related question row not consolidating duplicates in R when using multiple months in Date Filter). The OP wants to aggregate data across a number of files which apparently are too large to be loaded altogether and combined into a large data.table. Instead, each … Read more