custom python function
EdBerth
Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 15 ✭✭
Hi,
In my input dataset, I have a string column named vars like this [" 20547","21513 "], with an array meaning.
I woulf like to check if each element of this array is in an other array defined in global variables
{ "varGamme": [ "21513", "20547" ] }
I'm trying with a custom python function
import pandas as pd def process(rows): gamme =dss_variables["varGamme"] var = [c.strip() for c in rows["vars"]] result = [c in gamme for c in var] return result
I don't understand why I obtain only one value (false) in the new created cell
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,141 Neuron
Because you are not returning the correct data structure as you probably changed the mode of the Python function and forgot to update the code snippet by clicking in Edit Python Source Code. To return a new cell for each row you should use a function such as this one:
# Modify the process function to fit your needs import pandas as pd def process(rows): # In 'cell' mode, the process function must return # a single Pandas Series for each block of rows, # which will be affected to a new column. # The 'rows' argument is a dictionary of columns in the # block of rows, with values in the dictionary being # Pandas Series, which additionally holds an 'index' # field. return pd.Series(len(rows), index=rows.index)