[Solved] Is there way to replace ranged data (eg 18-25) by its mean in a dataframe?

[ad_1]

There are several ways to transform this variable. In the picture I see, that there are not only bins, but also value ’55+’, it needs to be considered.

1) One liner:

df['age'].apply(lambda x: np.mean([int(x.split('-')[0]), int(x.split('-')[1])]) if '+' not in x else x[:-1])

It checks whether the value contains ‘+’ (like 55+), if yes than the value without ‘+’ is returned. Otherwise the bin is splitted into two values, they are converted to ints and their mean is calculated.

2) Using dictionary for transformation:

mapping = {'1-17': 9, '18-25': 21.5, '55+': 55}
df['age'].apply(lambda x: mapping[x])

You need to add all values to mapping dictionary (calculate them manually or automatically). Then you apply this transformation to the series.

[ad_2]

solved Is there way to replace ranged data (eg 18-25) by its mean in a dataframe?