A little heads up here, I will try to address this question by using the package data.table
. I will also assume the data is in a data.table
called LGD_data_update
, as pointed out in your comment.
So, you will need this.
library(data.table)
LGD_data_update <- data.table( LGD_data_update)
In this case, you first need to sort the rows by date. However, the dates are not formatted to include the complete information of years.
You can do that by
LGD_data_update[, NPL_DATE := paste0(substr(NPL_DATE, 1, 6), "20" ,substr(NPL_DATE, 7, 8))]
LGD_data_update[, NPL_DATE := as.POSIXct(NPL_DATE, format = "%d-%m-%Y")]
Then, you can sort by the dates
LGD_data_update <- LGD_data_update[sort(NPL_DATE), ]
From here, I would create a placeholder to give a cumulative sum based on the records, by each account number, so that only the first record will be 1.
LGD_data_update[, Foo := 1]
LGD_data_update[, Foo := cumsum(Foo), by = "ACCOUNT_NUMBER"]
Then, we will only select the columns where the placeholder (Foo) has a value of 1, as those will be the earliest NPL Dates.
LGD_data_update <- LGD_data_update[Foo == 1, ]
If necessary, remove the placeholder
LGD_data_update[, Foo := NULL]
11
solved Create table with Distinct Account Number, first Date in R