[Solved] Trying to keep the same type after saving a dataframe in a csv file


csv files does not have a datatype definition header or something similar.
So when your read a csv pandas tries to guess the types and this can change the datatypes.
You have two possibile solutions:

  1. Provide the datatype list when you do read_csv with dtype and parse_dates keywords (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)
  2. Use a different file format that store data with a schema (ex parquet)

for example:

import pandas as pd
date = pd.to_datetime('01-01-2020')

df=pd.DataFrame({'col1':[1,2,3,4],'col2':['a','b','b','d'],'col3':[date,date,date,date]})

print('original \n',df.dtypes)

df.to_csv('testtype.csv',index=False)
df_csv = pd.read_csv('testtype.csv')
print('simple csv read \n',df_csv.dtypes)

df_csv = pd.read_csv('testtype.csv')
print('csv datatypes \n',df_csv.dtypes)

df_csv = pd.read_csv('testtype.csv',parse_dates=[2])
print('csv with parse dates \n',df_csv.dtypes)

df.to_parquet('testtype.pqt')
df_pqt=pd.read_parquet('testtype.pqt')

print('parquet  \n',df_pqt.dtypes)

that output:

original 
 col1             int64
col2            object
col3    datetime64[ns]
dtype: object

simple csv read 
 col1     int64
col2    object
col3    object
dtype: object

csv datatypes 
 col1     int64
col2    object
col3    object
dtype: object

csv with parse dates 
 col1             int64
col2            object
col3    datetime64[ns]
dtype: object

parquet  
 col1             int64
col2            object
col3    datetime64[ns]
dtype: object

2

solved Trying to keep the same type after saving a dataframe in a csv file