Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello-
I have some python code that is reading in a dataset and then writing out an updated version of that dataset. It is inferring several columns as integers when I want them to be text. Every time I run it overwrites my schema settings. Any recommendations?
thanks!
Mindy
Operating system used: unix
Hi @mbillingham,
The reason that the type is being changed is because pandas dataframes automatically infer the type of each column based on the values.
There are a few ways to prevent a Python recipe from changing the types depending on your code:
input_dataset = dataiku.Dataset("input_dataset")
input_df = input_dataset.get_dataframe(infer_with_pandas=False)
output_dataset = dataiku.Dataset("output_dataset")
output_dataset.write_dataframe(output_df)
output_df = output_df.astype({"col1": str, "col2": str})
Thanks,
Zach
Using the infer_with_pandas=False flag worked exactly the way I needed it to. Thank you so much!