how to extract key information (string) in one column and seperate them?
I have a column with some comments (original file excel), which contain various information. I would like to seperate them unter different comments/catalogues. Is that possible with DSS?
Process Comment | comment 1 | comment 2 | comment 3 | comment 4 | |
Titan, rework | Titan | rework | |||
rework, steel | steel | ||||
no heating, rework | rework | ||||
with Coating, 20kW | 20kw | ||||
TC defect, steel | steel | TC defect | |||
20kW, Titan | Titan | 20kw | |||
rework, 15kW | 15kw | rework | |||
TC defect | TC defect |
Best Answer
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
Here is a quick one-step visual recipe that will get you very close to what you want to do. If you need to rename the columns to comment 1, comment 2, comment 3... You will have to do some additional work.
This is a very useful layout for ML Models.
Hope this helps.
Answers
-
Thanks a lot for your quick reply. The thing is that in the real data the comments are very various, when I use this approach, there are more than 100 column created... Then I get a error. The question is whether I can extract 'mode' of those comments, I mean the comment words that appear very often. For the comment word, that appears once or twice, it can be ignored.
-
I am working with one column, which does not have so many various comments. It worked perfectly. Thank you again for your help!
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
@Tong
,If you are planing to use the results of this in a DSS visual ml then you might want to use the built in feature handling. Treating this column as text. Or converting this column into a JSON vector.
Here is a brief video that is part of the Dataiku Academy that talks about feature handling in DSS.
https://academy.dataiku.com/machine-learning-basics-open/522100