We're excited to announce that we're launching the second installment of Dataiku Product Days Register Now

how to extract key information (string) in one column and seperate them?

Solved!
Tong
Level 3
how to extract key information (string) in one column and seperate them?

I have a column with some comments (original file excel), which contain various information. I would like to seperate them unter different comments/catalogues. Is that possible with DSS?

Process Comment comment 1comment 2comment 3comment 4
Titan, rework Titan  rework
rework, steel steel   
no heating, rework    rework
with Coating, 20kW   20kw 
TC defect, steel steelTC defect  
20kW, Titan Titan 20kw 
rework, 15kW   15kwrework
TC defect  TC defect  
0 Kudos
1 Solution
tgb417
Neuron
Neuron

@Tong 

Here is a quick one-step visual recipe that will get you very close to what you want to do.  If you need to rename the columns to comment 1, comment 2, comment 3...  You will have to do some additional work.

split_unfold_recipie.jpg

This is a very useful layout for ML Models. 

Hope this helps.

--Tom

View solution in original post

0 Kudos
4 Replies
tgb417
Neuron
Neuron

@Tong 

Here is a quick one-step visual recipe that will get you very close to what you want to do.  If you need to rename the columns to comment 1, comment 2, comment 3...  You will have to do some additional work.

split_unfold_recipie.jpg

This is a very useful layout for ML Models. 

Hope this helps.

--Tom

View solution in original post

0 Kudos
Tong
Level 3
Author

@tgb417 

Thanks a lot for your quick reply. The thing is that in the real data the comments are very various, when I use this approach, there are more than 100 column created... Then I get a error. The question is whether I can extract 'mode' of those comments, I mean the comment words that appear very often. For the comment word, that appears once or twice, it can be ignored. 

 

0 Kudos
tgb417
Neuron
Neuron

@Tong ,

If you are planing to use the results of this in a DSS visual ml then you might want to use the built in feature handling.  Treating this column as text.  Or converting this column into a JSON vector.  

Here is a brief video that is part of the Dataiku Academy that talks about feature handling in DSS.

https://academy.dataiku.com/machine-learning-basics-open/522100

--Tom
0 Kudos
Tong
Level 3
Author

@tgb417

I am working with one column, which does not have so many various comments. It worked perfectly. Thank you again for your help!

0 Kudos
A banner prompting to get Dataiku DSS