Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
I read in special columns as datatype 'object' and I want to keep the columns as object.
In the next python code recipe Dataiku changes the datatype. How can I suppress this?
I want to keep the columns as I define them.
Thank you in advance for your support and best regards,
Ezno
Hi @Ezno, could you give some more context to what you are doing? For example, in the python code recipe how are you writing the dataset?
Hello Ignacio,
this is what happens.
I read the data in and say for a concrete column to read it in as an object:
with f.get_download_stream(F_FILE) as stream:
df = pd.read_excel(stream, dtype={'abc': object})
2 recipes later in the workflow, I try to merge:
f_k1_df = pd.merge(f_sorted_df,k_r_df[['abc','xyz']],
left_on = 'abc',
right_on = 'abc1',
how='left', suffixes=('','_k1'), sort=False)
Here I get the message, that it is not possible to merge datatype object with int64.
For each of the columns above I ensured to read them in as object, means that dataiku changes
inbetween the datatype of the columns.
How can I suppress this change. I want to keep my columns as object.
Thank you for your help in advance
Hi @Ezno! All of your data is being read with the .get_donwload_stream and pd.read_excel methods? If that is the case, then Dataiku itself has nothing to do with that, as your reader is a pandas method/function. I wonder, how did you create the f_sorted_df and k_r_df dataframes? If you did some manipulations it could happen that some columns might change their dtype.
Else, did you get some dataset as dataframes using a dataiku function, like Dataset.get_dataframe()? In that case you should check the documentation here.
Finally, there is a quick solution, which is to use some code to change the columns type to match (see here). Pandas usually tries to guess the dtype of the data being read, and unless you specify manually all the dtypes when reading, at some point or another you will need to do some type on conversion on the data wrangling process.
Cheers!