Type after Python recipe

RLD
RLD Registered Posts: 4

Hello all,

I've an issue in DSS. A dataset where i've forced the type in setting, schema is an input of a python recipe.

I try with different proposition of the developper guide but i'm lost because each time i've an error.

I forced the type of column "MASTER-ID" who is the ref of the object in int. At this first level everything is ok.

I put a python recipe to rename a lot of column in the dataset and to remove "/n". Using "infer_with_pandas = False" to keep the type. Also with "write_from_dataframe".

I'm lost now because it never works . I give you the python code :

-- coding: utf-8 --

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

Read recipe inputs

tmp_LAYER0_PCDB_DLOBJECT = dataiku.Dataset("tmp_LAYER0_PCDB_DLOBJECT")
tmp_LAYER0_PCDB_DLOBJECT_df = tmp_LAYER0_PCDB_DLOBJECT.get_dataframe(infer_with_pandas=True, bool_as_str=True)
#infer_with_pandas = True

Compute recipe outputs from inputsTODO: Replace this part by your actual code that computes the output, as a Pandas dataframeNB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.

tmp_FORMAT_DLOBJECT_df = tmp_LAYER0_PCDB_DLOBJECT_df # For this sample code, simply copy input to output

-------------------------------------------------------------------------------- NOTEBOOK-CELL: MARKDOWN

tmp_LAYER0_PCDB_DLOBJECT_df['MASTER_ID'] = tmp_LAYER0_PCDB_DLOBJECT_df['MASTER_ID'].fillna(0)

#tmp_FORMAT_DLOBJECT_df = tmp_FORMAT_DLOBJECT_df.fillna('0')
#tmp_FORMAT_DLOBJECT_df['ITERATION'] = tmp_FORMAT_DLOBJECT_df['ITERATION'].fillna('0')
#Fix "ID" columns to integer format
#tmp_FORMAT_DLOBJECT_df["ID"] = tmp_FORMAT_DLOBJECT_df["ID"].astype(int)
#Fix "REFERENCE" columns to string format
#tmp_FORMAT_DLOBJECT_df["REFERENCE"] = tmp_FORMAT_DLOBJECT_df["REFERENCE"].astype(str)
#Fix "ITERATION" columns to integer format
#tmp_FORMAT_DLOBJECT_df["ITERATION"] = tmp_FORMAT_DLOBJECT_df["ITERATION"].astype(int)
#Fix "REVISION" columns to string format
#tmp_FORMAT_DLOBJECT_df["REVISION"] = tmp_FORMAT_DLOBJECT_df["REVISION"].astype(int)
#Replace carrier return by ""
tmp_FORMAT_DLOBJECT_df = tmp_FORMAT_DLOBJECT_df.applymap(lambda x: x.replace('\n', '
') if isinstance(x, str) else x)

schema = [{'MASTER_ID': 'MASTER_ID', 'type': 'int'}]

-------------------------------------------------------------------------------- NOTEBOOK-CELL: CODEWrite recipe outputs

tmp_FORMAT_DLOBJECT = dataiku.Dataset("tmp_FORMAT_DLOBJECT")
tmp_FORMAT_DLOBJECT.write_from_dataframe(tmp_FORMAT_DLOBJECT_df)

The type is not keept after the build.

Have you a process or a solution? Thanks ;)

DSS V12

Best Answer

  • RLD
    RLD Registered Posts: 4
    Answer ✓
    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu

    # Read recipe inputs
    tmp_LAYER0_PCDB_DLOBJECT = dataiku.Dataset("tmp_LAYER0_PCDB_DLOBJECT")
    tmp_LAYER0_PCDB_DLOBJECT_df = tmp_LAYER0_PCDB_DLOBJECT.get_dataframe(infer_with_pandas=True, bool_as_str=True)
    #infer_with_pandas = True


    # Compute recipe outputs from inputs
    # TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
    # NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.
    tmp_FORMAT_DLOBJECT_df = tmp_LAYER0_PCDB_DLOBJECT_df # For this sample code, simply copy input to output

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    tmp_FORMAT_DLOBJECT_df['MASTER_ID'] = tmp_FORMAT_DLOBJECT_df['MASTER_ID'].fillna(0)
    tmp_FORMAT_DLOBJECT_df['MASTER_ID'] = tmp_FORMAT_DLOBJECT_df['MASTER_ID'].astype(int)

    tmp_FORMAT_DLOBJECT_df['REFERENCE'] = tmp_FORMAT_DLOBJECT_df['REFERENCE'].astype(str)

    tmp_FORMAT_DLOBJECT_df['PREV_REV_ID'] = tmp_FORMAT_DLOBJECT_df['PREV_REV_ID'].fillna('0')
    tmp_FORMAT_DLOBJECT_df['PREV_REV_ID'] = tmp_FORMAT_DLOBJECT_df['PREV_REV_ID'].astype(int)

    tmp_FORMAT_DLOBJECT_df['ISLASTREV'] = tmp_FORMAT_DLOBJECT_df['ISLASTREV'].fillna('0')
    tmp_FORMAT_DLOBJECT_df['ISLASTREV'] = tmp_FORMAT_DLOBJECT_df['ISLASTREV'].astype(int)

    tmp_FORMAT_DLOBJECT_df['ITERATION'] = tmp_FORMAT_DLOBJECT_df['ITERATION'].fillna('0')
    tmp_FORMAT_DLOBJECT_df['ITERATION'] = tmp_FORMAT_DLOBJECT_df['ITERATION'].astype(int)

    tmp_FORMAT_DLOBJECT_df['REVISION'] = tmp_FORMAT_DLOBJECT_df['REVISION'].fillna('0')
    tmp_FORMAT_DLOBJECT_df['REVISION'] = tmp_FORMAT_DLOBJECT_df['REVISION'].astype(int)


    #Replace carrier return by "_"
    tmp_FORMAT_DLOBJECT_df = tmp_FORMAT_DLOBJECT_df.applymap(lambda x: x.replace('\n', '_') if isinstance(x, str) else x)

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # Write recipe outputs
    tmp_FORMAT_DLOBJECT = dataiku.Dataset("tmp_FORMAT_DLOBJECT")
    tmp_FORMAT_DLOBJECT.write_with_schema(tmp_FORMAT_DLOBJECT_df)

    I'm answering to myself. I found this solution and it works perfectly. With False value it doesn't work.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,925 Neuron

    The code above uses infer_with_pandas=True not False like you said you did. Also please use the code block (the </> icon) to post code, like this:

    print("Hello")
    

  • RLD
    RLD Registered Posts: 4
    edited September 11

    sorry, thanks.

    I let the different line with comments after my tests. But i've used False

  • RLD
    RLD Registered Posts: 4
    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu

    # Read recipe inputs
    tmp_LAYER0_PCDB_DLOBJECT = dataiku.Dataset("tmp_LAYER0_PCDB_DLOBJECT")
    tmp_LAYER0_PCDB_DLOBJECT_df = tmp_LAYER0_PCDB_DLOBJECT.get_dataframe(infer_with_pandas=True, bool_as_str=True)
    #infer_with_pandas = True


    # Compute recipe outputs from inputs
    # TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe
    # NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc.
    tmp_FORMAT_DLOBJECT_df = tmp_LAYER0_PCDB_DLOBJECT_df.head(1000) # For this sample code, simply copy input to output

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: MARKDOWN
    tmp_LAYER0_PCDB_DLOBJECT_df['MASTER_ID'] = tmp_LAYER0_PCDB_DLOBJECT_df['MASTER_ID'].fillna(0)

    tmp_FORMAT_DLOBJECT_df['MASTER_ID'] = tmp_FORMAT_DLOBJECT_df['MASTER_ID'].astype(int)

    tmp_FORMAT_DLOBJECT_df['ITERATION'] = tmp_FORMAT_DLOBJECT_df['ITERATION'].fillna('0')

    tmp_FORMAT_DLOBJECT_df['REVISION'] = tmp_FORMAT_DLOBJECT_df['REVISION'].fillna('0')


    #Replace carrier return by "_"
    tmp_FORMAT_DLOBJECT_df = tmp_FORMAT_DLOBJECT_df.applymap(lambda x: x.replace('\n', '_') if isinstance(x, str) else x)

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # Write recipe outputs
    tmp_FORMAT_DLOBJECT = dataiku.Dataset("tmp_FORMAT_DLOBJECT")
    tmp_FORMAT_DLOBJECT.write_from_dataframe(tmp_FORMAT_DLOBJECT_df)

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,925 Neuron

    You are still using True in your code snippet.

Setup Info
    Tags
      Help me…