suppress datatype change

Options
Ezno
Ezno Registered Posts: 2 ✭✭✭

Hi,

I read in special columns as datatype 'object' and I want to keep the columns as object.

In the next python code recipe Dataiku changes the datatype. How can I suppress this?

I want to keep the columns as I define them.

Thank you in advance for your support and best regards,

Ezno

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Hi @Ezno
    , could you give some more context to what you are doing? For example, in the python code recipe how are you writing the dataset?

  • Ezno
    Ezno Registered Posts: 2 ✭✭✭
    Options

    Hello Ignacio,

    this is what happens.

    I read the data in and say for a concrete column to read it in as an object:

    with f.get_download_stream(F_FILE) as stream:
    df = pd.read_excel(stream, dtype={'abc': object})

    2 recipes later in the workflow, I try to merge:

    f_k1_df = pd.merge(f_sorted_df,k_r_df[['abc','xyz']],
    left_on = 'abc',
    right_on = 'abc1',
    how='left', suffixes=('','_k1'), sort=False)

    Here I get the message, that it is not possible to merge datatype object with int64.

    For each of the columns above I ensured to read them in as object, means that dataiku changes

    inbetween the datatype of the columns.

    How can I suppress this change. I want to keep my columns as object.

    Thank you for your help in advance

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 411 Neuron
    Options

    Hi @Ezno
    ! All of your data is being read with the .get_donwload_stream and pd.read_excel methods? If that is the case, then Dataiku itself has nothing to do with that, as your reader is a pandas method/function. I wonder, how did you create the f_sorted_df and k_r_df dataframes? If you did some manipulations it could happen that some columns might change their dtype.

    Else, did you get some dataset as dataframes using a dataiku function, like Dataset.get_dataframe()? In that case you should check the documentation here.

    Finally, there is a quick solution, which is to use some code to change the columns type to match (see here). Pandas usually tries to guess the dtype of the data being read, and unless you specify manually all the dtypes when reading, at some point or another you will need to do some type on conversion on the data wrangling process.

    Cheers!

Setup Info
    Tags
      Help me…