You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

Prepare recipe

Solved!
RohitRanga
Level 3
Prepare recipe

Hello, just wanted to know if this data transformation is possible out of the box.
text    label                        text     label

abc    ['A', 'B']        =>       abc     ['A', 'B']

def    C                                 def      ['C']

 

 

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi @RohitRanga ,

So you only need to handle the single string e.g convert C to ['C'] if it already starts with [ then do nothing.

First thing that comes to mind would be using a formula :

if(startsWith(new_column, "['"),new_column, concat("['",new_column,"']"))

Let me know if that works for you. 

View solution in original post

4 Replies
AlexT
Dataiker
Dataiker

Hi @RohitRanga,

Not sure I fully understand the transformation you are looking for, are you looking to convert C to an array? 

We do have the following processor which does what you are looking for :

https://doc.dataiku.com/dss/latest/preparation/processors/tokenizer.html

Let me know if that helps.

RohitRanga
Level 3
Author

@AlexT Thanks the response! Let me clarify my question:
I have a classification dataset with a label/class column. This column has either a list of strings or a single string. I want to make it uniform by converting those single string rows into a list with one string. Is this clear now?

0 Kudos
AlexT
Dataiker
Dataiker

Hi @RohitRanga ,

So you only need to handle the single string e.g convert C to ['C'] if it already starts with [ then do nothing.

First thing that comes to mind would be using a formula :

if(startsWith(new_column, "['"),new_column, concat("['",new_column,"']"))

Let me know if that works for you. 

RohitRanga
Level 3
Author

Wow, I was not aware that we could do this. Thanks a lot @AlexT !