The duplicated
function traverses its argument(s) sequentially and returns TRUE if there has been a prior value identical to the current value. It is a generic function, so it has a default definition (for vectors) but also a definition for other classes, such as objects of the data.frame class. The subset function treats expressions passed as a second or third argument as though column names are first class objects. This is called “non-standard evaluation”. (Notice the negation operator.) So this call to subset
will return the rows of a data.frame where only the first instance of the column named “x” is not duplicated. It would probably return a dataframe with only the number of rows that equal the number of unique items in the x column.
> dat <- data.frame( x =sample(1:5, 20, repl=TRUE), y=1:5, z=1:4)
> dat
x y z
1 2 1 1
2 2 2 2
3 2 3 3
4 5 4 4
5 4 5 1
6 1 1 2
7 2 2 3
8 2 3 4
9 5 4 1
10 1 5 2
11 2 1 3
12 4 2 4
13 5 3 1
14 4 4 2
15 3 5 3
16 3 1 4
17 4 2 1
18 4 3 2
19 1 4 3
20 1 5 4
> subset(dat, !duplicated(x))
x y z
1 2 1 1
4 5 4 4
5 4 5 1
6 1 1 2
15 3 5 3
2
solved What does subset(df, !duplicated(x)) do?