Survey banner
The Dataiku Community is moving to a new home! Some short term disruption starting next week: LEARN MORE

custom python function

EdBerth
Level 3
custom python function

Hi,

In my input dataset, I have a string column named vars like this [" 20547","21513 "], with an array meaning.

I woulf like to check if each element of this array is in an other array defined in global variables 

{
"varGamme": [
"21513",
"20547"
]
}

 

I'm trying with a custom python function

import pandas as pd
def process(rows):
    gamme =dss_variables["varGamme"]
    
    var = [c.strip() for c in rows["vars"]]

    result = [c in gamme for c in var]

    
    return result

 

I don't understand why I obtain only one value (false) in the new created cell

0 Kudos
1 Reply
Turribeach

Because you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:

# Modify the process function to fit your needs
import pandas as pd
def process(rows):
    # In 'cell' mode, the process function must return
    # a single Pandas Series for each block of rows,
    # which will be affected to a new column.
    # The 'rows' argument is a dictionary of columns in the
    # block of rows, with values in the dictionary being
    # Pandas Series, which additionally holds an 'index'
    # field.
    return pd.Series(len(rows), index=rows.index)

 

 

0 Kudos