Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

Problem to keep the variable format int export csv

Solved!
scholaschl
Level 2
Problem to keep the variable format int export csv

Hello,

I want to export a table to a csv file in folder using this method (python recipe):

import dataiku
import pandas as pd

temp_folder = "1VZVmdhX"

path_upload_file = "data.csv"
input_dataset = dataiku.Dataset("data")
df = input_dataset.get_dataframe()

handle = dataiku.Folder(temp_folder)
with handle.get_writer(path_upload_file) as w:
   w.write(df.to_csv(sep=';').encode('utf-8'))

However, my variables in the input file are integers (25 for example) and the output csv turns me variables into double: 25.0.
What would be the solution to keep the int format?

0 Kudos
1 Solution
ZachM
Dataiker

Hi @scholaschl ,

This is likely occurring because your input data contains rows where Pandas thinks that the value is a float instead of an integer. This can happen if your data contains any missing values, because Pandas treats NaN as a float. For more information, see this post.

You can fix it by telling Pandas to drop the decimal when formatting floats:

import dataiku
import pandas as pd

temp_folder = "1VZVmdhX"

path_upload_file = "data.csv"
input_dataset = dataiku.Dataset("data")
df = input_dataset.get_dataframe()

handle = dataiku.Folder(temp_folder)
with handle.get_writer(path_upload_file) as w:
    w.write(df.to_csv(sep=';', float_format="%.0f").encode('utf-8'))

 

Thanks,

Zach

View solution in original post

0 Kudos
2 Replies
ZachM
Dataiker

Hi @scholaschl ,

This is likely occurring because your input data contains rows where Pandas thinks that the value is a float instead of an integer. This can happen if your data contains any missing values, because Pandas treats NaN as a float. For more information, see this post.

You can fix it by telling Pandas to drop the decimal when formatting floats:

import dataiku
import pandas as pd

temp_folder = "1VZVmdhX"

path_upload_file = "data.csv"
input_dataset = dataiku.Dataset("data")
df = input_dataset.get_dataframe()

handle = dataiku.Folder(temp_folder)
with handle.get_writer(path_upload_file) as w:
    w.write(df.to_csv(sep=';', float_format="%.0f").encode('utf-8'))

 

Thanks,

Zach

0 Kudos
scholaschl
Level 2
Author

Thank you very much for your answer, it works well 

0 Kudos