[Solved] Is there way to replace ranged data (eg 18-25) by its mean in a dataframe?

Question

There are several ways to transform this variable. In the picture I see, that there are not only bins, but also value ’55+’, it needs to be considered.

1) One liner:

df['age'].apply(lambda x: np.mean([int(x.split('-')[0]), int(x.split('-')[1])]) if '+' not in x else x[:-1])

It checks whether the value contains ‘+’ (like 55+), if yes than the value without ‘+’ is returned. Otherwise the bin is splitted into two values, they are converted to ints and their mean is calculated.

2) Using dictionary for transformation:

mapping = {'1-17': 9, '18-25': 21.5, '55+': 55}
df['age'].apply(lambda x: mapping[x])

You need to add all values to mapping dictionary (calculate them manually or automatically). Then you apply this transformation to the series.

Accepted Answer

There are several ways to transform this variable. In the picture I see, that there are not only bins, but also value ’55+’, it needs to be considered.

1) One liner:

df['age'].apply(lambda x: np.mean([int(x.split('-')[0]), int(x.split('-')[1])]) if '+' not in x else x[:-1])

It checks whether the value contains ‘+’ (like 55+), if yes than the value without ‘+’ is returned. Otherwise the bin is splitted into two values, they are converted to ints and their mean is calculated.

2) Using dictionary for transformation:

mapping = {'1-17': 9, '18-25': 21.5, '55+': 55}
df['age'].apply(lambda x: mapping[x])

You need to add all values to mapping dictionary (calculate them manually or automatically). Then you apply this transformation to the series.