Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Problem to keep the variable format int export csv

Solved!
scholaschl
Level 2
Problem to keep the variable format int export csv

Hello,

I want to export a table to a csv file in folder using this method (python recipe):

import dataiku
import pandas as pd

temp_folder = "1VZVmdhX"

path_upload_file = "data.csv"
input_dataset = dataiku.Dataset("data")
df = input_dataset.get_dataframe()

handle = dataiku.Folder(temp_folder)
with handle.get_writer(path_upload_file) as w:
   w.write(df.to_csv(sep=';').encode('utf-8'))

However, my variables in the input file are integers (25 for example) and the output csv turns me variables into double: 25.0.
What would be the solution to keep the int format?

0 Kudos
1 Solution
ZachM
Dataiker

Hi @scholaschl ,

This is likely occurring because your input data contains rows where Pandas thinks that the value is a float instead of an integer. This can happen if your data contains any missing values, because Pandas treats NaN as a float. For more information, see this post.

You can fix it by telling Pandas to drop the decimal when formatting floats:

import dataiku
import pandas as pd

temp_folder = "1VZVmdhX"

path_upload_file = "data.csv"
input_dataset = dataiku.Dataset("data")
df = input_dataset.get_dataframe()

handle = dataiku.Folder(temp_folder)
with handle.get_writer(path_upload_file) as w:
    w.write(df.to_csv(sep=';', float_format="%.0f").encode('utf-8'))

 

Thanks,

Zach

View solution in original post

0 Kudos
2 Replies
ZachM
Dataiker

Hi @scholaschl ,

This is likely occurring because your input data contains rows where Pandas thinks that the value is a float instead of an integer. This can happen if your data contains any missing values, because Pandas treats NaN as a float. For more information, see this post.

You can fix it by telling Pandas to drop the decimal when formatting floats:

import dataiku
import pandas as pd

temp_folder = "1VZVmdhX"

path_upload_file = "data.csv"
input_dataset = dataiku.Dataset("data")
df = input_dataset.get_dataframe()

handle = dataiku.Folder(temp_folder)
with handle.get_writer(path_upload_file) as w:
    w.write(df.to_csv(sep=';', float_format="%.0f").encode('utf-8'))

 

Thanks,

Zach

0 Kudos
scholaschl
Level 2
Author

Thank you very much for your answer, it works well 

0 Kudos