In R it is always preferable to avoid loops wherever possible, as they are usually much slower than alternative vectorized solutions.
This operation can be done with a data.table join. Basically, when you run
dt1[dt2];
you are performing a right-join between the two data.tables. The preset key columns of dt1
determine which columns to join on. If dt1
has no preset key, the operation fails. But you can specify the on
argument to manually select the key columns on-the-fly:
key <- paste0('V',1:5);
dt1[dt2,on=key];
(The alternative of course is to preset a key, using either setkey()
or setkeyv()
.)
The above operation will actually just return a merged table containing data from both dt1
and dt2
, which is not what you want. But we can make use of the j
argument of the data.table indexing function and the :=
in-place assignment syntax to assign the id
column of dt2
to the id
column of dt1
. Because we have a name conflict, we must use i.id
to reference the id
column of dt2
, while the unmodified name id
still refers to the id
column of dt1
. This is simply the mechanism provided by data.table for disambiguating conflicting names. Hence, you’re looking for:
dt1[dt2,on=key,id:=i.id];
Here’s an example that uses only two key columns and just a few rows of data (for simplicity). I also generated the keys to include some non-matching rows, just to demonstrate that the non-matching rows will have their ids left untouched by the operation.
set.seed(1L);
dt1 <- data.table(id=1:12,expand.grid(V1=1:3,V2=1:4),blah1=rnorm(12L));
dt2 <- data.table(id=13:18,expand.grid(V1=1:2,V2=1:3),blah2=rnorm(6L));
dt1;
## id V1 V2 blah1
## 1: 1 1 1 -0.6264538
## 2: 2 2 1 0.1836433
## 3: 3 3 1 -0.8356286
## 4: 4 1 2 1.5952808
## 5: 5 2 2 0.3295078
## 6: 6 3 2 -0.8204684
## 7: 7 1 3 0.4874291
## 8: 8 2 3 0.7383247
## 9: 9 3 3 0.5757814
## 10: 10 1 4 -0.3053884
## 11: 11 2 4 1.5117812
## 12: 12 3 4 0.3898432
dt2;
## id V1 V2 blah2
## 1: 13 1 1 -0.62124058
## 2: 14 2 1 -2.21469989
## 3: 15 1 2 1.12493092
## 4: 16 2 2 -0.04493361
## 5: 17 1 3 -0.01619026
## 6: 18 2 3 0.94383621
key <- paste0('V',1:2);
dt1[dt2,on=key,id:=i.id];
dt1;
## id V1 V2 blah1
## 1: 13 1 1 -0.6264538
## 2: 14 2 1 0.1836433
## 3: 3 3 1 -0.8356286
## 4: 15 1 2 1.5952808
## 5: 16 2 2 0.3295078
## 6: 6 3 2 -0.8204684
## 7: 17 1 3 0.4874291
## 8: 18 2 3 0.7383247
## 9: 9 3 3 0.5757814
## 10: 10 1 4 -0.3053884
## 11: 11 2 4 1.5117812
## 12: 12 3 4 0.3898432
5
solved Matching Data Tables by five columns to change a value in another column