custom python function

Options
EdBerth
EdBerth Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 15
edited July 16 in Using Dataiku

Hi,

In my input dataset, I have a string column named vars like this [" 20547","21513 "], with an array meaning.

I woulf like to check if each element of this array is in an other array defined in global variables

{
"varGamme": [
"21513",
"20547"
]
}

I'm trying with a custom python function

import pandas as pd
def process(rows):
    gamme =dss_variables["varGamme"]
    
    var = [c.strip() for c in rows["vars"]]

    result = [c in gamme for c in var]

    
    return result

I don't understand why I obtain only one value (false) in the new created cell

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,728 Neuron
    edited July 17
    Options

    Because you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:

    # Modify the process function to fit your needs
    import pandas as pd
    def process(rows):
        # In 'cell' mode, the process function must return
        # a single Pandas Series for each block of rows,
        # which will be affected to a new column.
        # The 'rows' argument is a dictionary of columns in the
        # block of rows, with values in the dictionary being
        # Pandas Series, which additionally holds an 'index'
        # field.
        return pd.Series(len(rows), index=rows.index)

Setup Info
    Tags
      Help me…