Dataiku Named a Gartner Magic Quadrant Leader 2 Years Running! Read More

regex columns in a custom Preparation processor

NN
Level 3
regex columns in a custom Preparation processor

sHey Everyone,

I want to attempt to create a custom Prepare processor and was reading one of the documents which helps me edit multiple columns
https://doc.dataiku.com/dss/latest/plugins/reference/preparation.html#output-multiple-columns

In the example shared it shows the Input column as a single column for the dataset list. 
Is there any way that i can use a regex (which user inputs) to derive the list of columns to be edited .

2 Replies
CoreyS
Community Manager
Community Manager

Hi, @NN! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.

Looking for more resources to help you use DSS effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as ‘Accepted Solution’ to help others like you!
0 Kudos
NN
Level 3
Author

Hi,

I was able to resolve my primary requirement using the examples shared by Dataiku and the question i ask below are not important but just good to learn if someone can guide me.

I am on dataiku 8.01 trying to create a custom processor for the prepare recipe.

The aim is that the user provides a regex for column names.
And if we find a value (example:1) in the column we replace it with another value (Example:10)

In my processor.json the "mode": "ROW",

The processor.py will be something like below

 

def process(row):    
    keylist=row.keys()
    r = re.compile(params.get('user_regex'), re.IGNORECASE)
    newlist = list(filter(r.match, keylist))
    for col in newlist:
           if row[col]=="1":
                row[col]="10"
           elif row[col]="30":
                row[col]="300"
            
    return row

 

I first ran a Find and replace processor in Prepare recipe (using the multiple columns option) This highlights only the cells which are modified as you can see in the second image below. and the note also shows 2 rows modified.

However when i run my custom processor which is the third image below, it shows all 5 rows as modified.
Though the value has only changed in 2 cells.

While this works almost perfectly for my need , my question is can i improve it a step further and make it similar to the Find and Replace recipe to only highlight specific cells or rows or even just the columns which are modified instead of the entire data.

processor.JPG

 

 

 

 

 

A banner prompting to get Dataiku DSS