You could group by name, next count values and filter results which have count 3
(because you have 3 years)
groups = df.groupby('name').count()
result = groups[ groups['date'] == 3 ].index.to_list()
print(result)
Or you could directly count names
counts = df['name'].value_counts()
result = counts[ counts == 3 ].index.to_list()
print('result:', result)
Minimal working example:
I use io.StringIO
only to simulate file.
text=""' date name
0 2019 a
1 2019 b
2 2019 c
3 2020 b
4 2020 c
5 2021 b
6 2021 c
'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep='\s+')
counts = df['name'].value_counts()
result = counts[ counts == 3 ].index.to_list()
print('result:', result)
groups = df.groupby('name').count()
result = groups[ groups['date'] == 3 ].index.to_list()
print('result:', result)
BTW:
Instead of hardcoded value 3
you could count unique date
years = df['date'].unique()
print(years, len(years))
Result
[2019 2020 2021] 3
And this way you could use len(years)
in place of 3
EDIT:
If values can repeate then you can use unique()
in group to remove repeated values.
text=""' date name
0 2019 a
1 2019 b
2 2019 c
3 2020 b
4 2020 c
5 2021 b
6 2021 c
7 2019 a
8 2019 a
'''
import io
import pandas as pd
df = pd.read_csv(io.StringIO(text), sep='\s+')
groups = df.groupby('name')
#counts = groups['date'].unique().apply(len)
counts = groups['date'].nunique()
result = counts[ counts == 3 ].index.to_list()
print('result:', result)
4
solved how to know if a value is in the same range of time in python [closed]