I want to attempt to create a custom Prepare processor and was reading one of the documents which helps me edit multiple columns
In the example shared it shows the Input column as a single column for the dataset list.
Is there any way that i can use a regex (which user inputs) to derive the list of columns to be edited .
Hi, @NN! Can you provide any further details on the thread to assist users in helping you find a solution (insert examples like DSS version etc.) Also, can you let us know if you’ve tried any fixes already?This should lead to a quicker response from the community.
I was able to resolve my primary requirement using the examples shared by Dataiku and the question i ask below are not important but just good to learn if someone can guide me.
I am on dataiku 8.01 trying to create a custom processor for the prepare recipe.
The aim is that the user provides a regex for column names.
And if we find a value (example:1) in the column we replace it with another value (Example:10)
In my processor.json the "mode": "ROW",
The processor.py will be something like below
def process(row): keylist=row.keys() r = re.compile(params.get('user_regex'), re.IGNORECASE) newlist = list(filter(r.match, keylist)) for col in newlist: if row[col]=="1": row[col]="10" elif row[col]="30": row[col]="300" return row
I first ran a Find and replace processor in Prepare recipe (using the multiple columns option) This highlights only the cells which are modified as you can see in the second image below. and the note also shows 2 rows modified.
However when i run my custom processor which is the third image below, it shows all 5 rows as modified.
Though the value has only changed in 2 cells.
While this works almost perfectly for my need , my question is can i improve it a step further and make it similar to the Find and Replace recipe to only highlight specific cells or rows or even just the columns which are modified instead of the entire data.