[Solved] Create table with Distinct Account Number, first Date in R


A little heads up here, I will try to address this question by using the package data.table. I will also assume the data is in a data.table called LGD_data_update, as pointed out in your comment.

So, you will need this.

 library(data.table)
 LGD_data_update <- data.table( LGD_data_update)

In this case, you first need to sort the rows by date. However, the dates are not formatted to include the complete information of years.

You can do that by

 LGD_data_update[, NPL_DATE := paste0(substr(NPL_DATE, 1, 6), "20" ,substr(NPL_DATE, 7, 8))]
 LGD_data_update[, NPL_DATE := as.POSIXct(NPL_DATE, format = "%d-%m-%Y")]

Then, you can sort by the dates

 LGD_data_update <-  LGD_data_update[sort(NPL_DATE), ]

From here, I would create a placeholder to give a cumulative sum based on the records, by each account number, so that only the first record will be 1.

LGD_data_update[, Foo := 1]
LGD_data_update[, Foo := cumsum(Foo), by = "ACCOUNT_NUMBER"]

Then, we will only select the columns where the placeholder (Foo) has a value of 1, as those will be the earliest NPL Dates.

 LGD_data_update <-  LGD_data_update[Foo == 1, ]

If necessary, remove the placeholder

 LGD_data_update[, Foo := NULL]

11

solved Create table with Distinct Account Number, first Date in R