Rename columns with Regular Expressions in a Visual Prepare Recipe

0 Kudos

User Story:

As a Data Analyst or Subject Matter Expert I would like to be able to use a visual prepare recipe to bulk cleanup column names made by the Visual Group, Window, and Pivot recipes.  These recipes often and suffixes to column names like "_max", "_concat", "_min".  I would like to use a visual prepare recipe step to find a subset of columns with a regular expression like *._count and replace the _count with nothing.  

COS

* Don't change the default behavior of the Group by, Window, and, Pivot visual recipes.  If you want to attach these issues there dropping those column name suffixes should be an advanced optional parameter.
 * I would prefer to see this fixed by making the column re-name visual prepare recipe step smarter allowing the all of the usual, single, multiple, pattern, all options for column selection.
* I would prefer to have the renames be able to be done like the one time regular expression rename where you can find the substring in the column name you want to change and the constant you want to change it with.

Notes:

I recognize that this might be a big ask because of Schema Management.  I a recipe step can dynamically rename columns on the fly using patterns and not simply replace constant old with constant new as is currently offered might be challenging.

--Tom
3 Comments
ktgross15
Dataiker

Thanks for the feedback @tgb417 , we've logged this internally and will let you know of any updates.

Status changed to: In the Backlog

Thanks for the feedback @tgb417 , we've logged this internally and will let you know of any updates.

AshleyW
Dataiker

Hi, 

Updating this thread to let you know that we've made it easier to mass rename columns in the Prepare in all 12.3.2+ versions of Dataiku. We have some limitations due to schema management w.r.t. dynamic schema--as @tgb417 rightly pointed out--but here's what's a lot easier to do now: 

  • open the 'rename column' processor
  • notice the new 'mass renamings' button that provides many options for mass renaming the columns in your dataset: F/R, prefixed, suffixes, etc
  • apply your settings as needed
  • rename column processor will update with all the renamings. 

Cheers, 

Ashley

Status changed to: Released

Hi, 

Updating this thread to let you know that we've made it easier to mass rename columns in the Prepare in all 12.3.2+ versions of Dataiku. We have some limitations due to schema management w.r.t. dynamic schema--as @tgb417 rightly pointed out--but here's what's a lot easier to do now: 

  • open the 'rename column' processor
  • notice the new 'mass renamings' button that provides many options for mass renaming the columns in your dataset: F/R, prefixed, suffixes, etc
  • apply your settings as needed
  • rename column processor will update with all the renamings. 

Cheers, 

Ashley

@AshleyW I’ve not found this yet.  I’ll keep an eye out for this.  

--Tom

@AshleyW I’ve not found this yet.  I’ll keep an eye out for this.