Python code is overwriting my schema settings!
mbillingham
Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 7 ✭
Hello-
I have some python code that is reading in a dataset and then writing out an updated version of that dataset. It is inferring several columns as integers when I want them to be text. Every time I run it overwrites my schema settings. Any recommendations?
thanks!
Mindy
Operating system used: unix
Tagged:
Answers
-
Hi @mbillingham
,The reason that the type is being changed is because pandas dataframes automatically infer the type of each column based on the values.
There are a few ways to prevent a Python recipe from changing the types depending on your code:
- Prevent pandas from inferring the types of the input dataset. This will cause it to copy the types from your input dataset:
input_dataset = dataiku.Dataset("input_dataset") input_df = input_dataset.get_dataframe(infer_with_pandas=False)
- Write data to the output dataset using write_dataframe() instead of write_with_schema(). This will prevent it from changing the schema, but it will fail if the existing schema isn't compatible:
output_dataset = dataiku.Dataset("output_dataset") output_dataset.write_dataframe(output_df)
- Manually change the type of the columns that you want to change to string:
output_df = output_df.astype({"col1": str, "col2": str})
Thanks,
Zach
- Prevent pandas from inferring the types of the input dataset. This will cause it to copy the types from your input dataset:
-
Using the infer_with_pandas=False flag worked exactly the way I needed it to. Thank you so much!