Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

suppress datatype change

Level 1
suppress datatype change

Hi,

I read in special columns as datatype 'object' and I want to keep the columns as object.

In the next python code recipe Dataiku changes the datatype. How can I suppress this?

I want to keep the columns as I define them. 

Thank you in advance for your support and best regards,

Ezno

0 Kudos
3 Replies

Hi @Ezno, could you give some more context to what you are doing? For example, in the python code recipe how are you writing the dataset?

0 Kudos
Level 1
Author

Hello Ignacio,

this is what happens.

I read the data in and say for a concrete column to read it in as an object:

with f.get_download_stream(F_FILE) as stream:
    df = pd.read_excel(stream, dtype={'abc': object})

2 recipes later in the workflow, I try to merge:

f_k1_df = pd.merge(f_sorted_df,k_r_df[['abc','xyz']],
                            left_on = 'abc',
                            right_on = 'abc1',
                            how='left', suffixes=('','_k1'), sort=False)

Here I get the message, that it is not possible to merge datatype object with int64. 

For each of the columns above I ensured to read them in as object, means that dataiku changes

inbetween the datatype of the columns.

How can I suppress this change. I want to keep my columns as object.

Thank you for your help in advance

 

 

0 Kudos

Hi @Ezno! All of your data is being read with the .get_donwload_stream and pd.read_excel methods? If that is the case, then Dataiku itself has nothing to do with that, as your reader is a pandas method/function. I wonder, how did you create the f_sorted_df and k_r_df dataframes? If you did some manipulations it could happen that some columns might change their dtype.

Else, did you get some dataset as dataframes using a dataiku function, like Dataset.get_dataframe()? In that case you should check the documentation here

Finally, there is a quick solution, which is to use some code to change the columns type to match (see here). Pandas usually tries to guess the dtype of the data being read, and unless you specify manually all the dtypes when reading, at some point or another you will need to do some type on conversion on the data wrangling process.

Cheers!

0 Kudos