[Solved] Add a column for counting unique tuples in the data frame [duplicate]

Question

1) aggregate

ag <- aggregate(count ~ ., cbind(count = 1, df), length)
ag[do.call("order", ag), ]  # sort the rows

giving:

  userID A B count
3      1 2 2     1
4      1 3 3     1
2      3 2 1     2
1      5 1 0     2

The last line of code which sorts the rows could be omitted if the order of the rows is unimportant.

The remaining solutions use the indicated packages:

2) sqldf

library(sqldf)
Names <- toString(names(df))
fn$sqldf("select *, count(*) count from df group by $Names order by $Names")

giving:

  userID A B count
1      1 2 2     1
2      1 3 3     1
3      3 2 1     2
4      5 1 0     2

The order by clause could be omitted if the order is unimportant.

3) dplyr

library(dplyr)
df %>% regroup(as.list(names(df))) %>% summarise(count = n())

giving:

Source: local data frame [4 x 4]
Groups: userID, A
  userID A B count
1      1 2 2     1
2      1 3 3     1
3      3 2 1     2
4      5 1 0     2

4) data.table

library(data.table)
data.table(df)[, list(count = .N), by = names(df)]

giving:

   userID A B count
1:      1 2 2     1
2:      1 3 3     1
3:      3 2 1     2
4:      5 1 0     2

ADDED additional solutions. Also some small improvements.

Accepted Answer

1) aggregate

ag <- aggregate(count ~ ., cbind(count = 1, df), length)
ag[do.call("order", ag), ]  # sort the rows

giving:

  userID A B count
3      1 2 2     1
4      1 3 3     1
2      3 2 1     2
1      5 1 0     2

The last line of code which sorts the rows could be omitted if the order of the rows is unimportant.

The remaining solutions use the indicated packages:

2) sqldf

library(sqldf)
Names <- toString(names(df))
fn$sqldf("select *, count(*) count from df group by $Names order by $Names")

giving:

  userID A B count
1      1 2 2     1
2      1 3 3     1
3      3 2 1     2
4      5 1 0     2

The order by clause could be omitted if the order is unimportant.

3) dplyr

library(dplyr)
df %>% regroup(as.list(names(df))) %>% summarise(count = n())

giving:

Source: local data frame [4 x 4]
Groups: userID, A
  userID A B count
1      1 2 2     1
2      1 3 3     1
3      3 2 1     2
4      5 1 0     2

4) data.table

library(data.table)
data.table(df)[, list(count = .N), by = names(df)]

giving:

   userID A B count
1:      1 2 2     1
2:      1 3 3     1
3:      3 2 1     2
4:      5 1 0     2

ADDED additional solutions. Also some small improvements.