how to extract key information (string) in one column and seperate them?

Options
Tong
Tong Registered Posts: 13 ✭✭✭✭

I have a column with some comments (original file excel), which contain various information. I would like to seperate them unter different comments/catalogues. Is that possible with DSS?

Process Comment comment 1comment 2comment 3comment 4
Titan, rework Titan rework
rework, steel steel
no heating, rework rework
with Coating, 20kW 20kw
TC defect, steel steelTC defect
20kW, Titan Titan 20kw
rework, 15kW 15kwrework
TC defect TC defect

Best Answer

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Answer ✓
    Options

    @Tong

    Here is a quick one-step visual recipe that will get you very close to what you want to do. If you need to rename the columns to comment 1, comment 2, comment 3... You will have to do some additional work.

    split_unfold_recipie.jpg

    This is a very useful layout for ML Models.

    Hope this helps.

Answers

  • Tong
    Tong Registered Posts: 13 ✭✭✭✭
    Options

    @tgb417

    Thanks a lot for your quick reply. The thing is that in the real data the comments are very various, when I use this approach, there are more than 100 column created... Then I get a error. The question is whether I can extract 'mode' of those comments, I mean the comment words that appear very often. For the comment word, that appears once or twice, it can be ignored.

  • Tong
    Tong Registered Posts: 13 ✭✭✭✭
    Options

    @tgb417

    I am working with one column, which does not have so many various comments. It worked perfectly. Thank you again for your help!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @Tong
    ,

    If you are planing to use the results of this in a DSS visual ml then you might want to use the built in feature handling. Treating this column as text. Or converting this column into a JSON vector.

    Here is a brief video that is part of the Dataiku Academy that talks about feature handling in DSS.

    https://academy.dataiku.com/machine-learning-basics-open/522100

Setup Info
    Tags
      Help me…