Python code is overwriting my schema settings!

mbillingham · December 2022

Hello-

I have some python code that is reading in a dataset and then writing out an updated version of that dataset. It is inferring several columns as integers when I want them to be text. Every time I run it overwrites my schema settings. Any recommendations?

thanks!

Mindy

Operating system used: unix

Zach · December 2022

Hi @mbillingham
,

The reason that the type is being changed is because pandas dataframes automatically infer the type of each column based on the values.

There are a few ways to prevent a Python recipe from changing the types depending on your code:

Prevent pandas from inferring the types of the input dataset. This will cause it to copy the types from your input dataset:
```
input_dataset = dataiku.Dataset("input_dataset")
input_df = input_dataset.get_dataframe(infer_with_pandas=False)
```
Write data to the output dataset using write_dataframe() instead of write_with_schema(). This will prevent it from changing the schema, but it will fail if the existing schema isn't compatible:
```
output_dataset = dataiku.Dataset("output_dataset")
output_dataset.write_dataframe(output_df)
```
Manually change the type of the columns that you want to change to string:
```
output_df = output_df.astype({"col1": str, "col2": str})
```

Thanks,

Zach

mbillingham · December 2022

Using the infer_with_pandas=False flag worked exactly the way I needed it to. Thank you so much!

Python code is overwriting my schema settings!

Answers

Categories

Setup Info

Tags