Dataframe pivot table from python doesn't save properly as Dataiku dataset
Hi everyone,
I am creating a pivot table using pandas data frame in python.
df_pivot = pd.pivot_table(df, values=['Units'], index =['Name', 'City', 'State', 'Zip'], columns= ['Date'],
aggfunc={'Units': 'sum'}, margins = True, fill_value=0)
However, when I save it using the following:
dataset_new = dataiku.Dataset("Dataset Output")
dataset_new.write_with_schema(df_pivot)
The index columns (the ones you use to convert to rows in the pivot table, ['Name', 'City', 'State', 'Zip'] ) do not print out in the final Dataiku dataset. That is, I end up with a dataset that contains only the columns defined in the pivot table (columns= ['Date']). Why does this happen? How can I fix this?
Some workarounds I have tried and have not worked are:
- I tried to save the dataframe as Excel file, but I could not export it from the DSS Notebook to my computer.
- I thought I could save the dataframe as Exce and then convert it to a Dataiku dataset. However, l I could not write the Dataiku dataset with an Excel file. Is there a function that can do this?
I sincerely appreciate any adivce!
Operating system used: MACos
Operating system used: MACos
Operating system used: MACos
Answers
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hello @Festa,
Have you tried doing this?
dataset_new.write_with_schema(df_pivot.reset_index())
I think the Pandas DataFrame indexes are not saved to the dataframe, so you have to set them back as columns.
-
Hi @Ignacio_Toledo!
Thank you so much for your suggestion. It worked!