Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

How to split a cell in several columns

Solved!
Romains
Level 1
How to split a cell in several columns

Hello,

Ive an Excel data sheet containing HR data. Each line represent a person. One specific cell contain all the jobs that person had in his career. The cell is composed by several lines, each line has this format :
Date of entrance in the job - Job ID - Job name

The number of lines vary from person to person. One can have just one, another one can have 20.

I would like to transform this cell in 3*x columns, given x the max number of line of the person having the maximum number of jobs / lines.

How to do in DSS please?

I attached a data example.

Thank you.


Operating system used: Windows

0 Kudos
1 Solution
AlexGo
Dataiker
Dataiker

Hi Romains,

Not sure if this is the best way to do it, but I would do the following:

  1. Split the HISTORIQUE_AFFECTATION column on newline "\n" (you might have to do a replace with something like "||" first if it doesn't recognize it). I would also select 'Output as Array (JSON)'
  2. 'Fold array values' of col_1 to new rows
  3. Extract using regular expression - something like
    ([0-9]{4}\pP[0-9]{2}\pP[0-9]{2})\s:\s(\w{1,10})\s-\s(.*)

 

Here are the steps I used if you want to try copy and paste:

 

{
  "type": "shaker-steps",
  "version": "10.0.4-beta2",
  "steps": [
    {
      "preview": false,
      "metaType": "PROCESSOR",
      "disabled": false,
      "type": "FindReplace",
      "params": {
        "output": "",
        "mapping": [
          {
            "from": "\\n",
            "to": "||"
          },
          {
            "from": "",
            "to": ""
          }
        ],
        "normalization": "EXACT",
        "columns": [
          "HISTORIQUE_AFFECTATION"
        ],
        "appliesTo": "SINGLE_COLUMN",
        "stopAfterFirstMatch": false,
        "matching": "PATTERN"
      },
      "alwaysShowComment": false,
      "selected": true
    },
    {
      "preview": false,
      "designTimeReport": {
        "report": {
          "createdColumnCount": 0
        }
      },
      "metaType": "PROCESSOR",
      "disabled": false,
      "type": "ColumnSplitter",
      "params": {
        "inCol": "HISTORIQUE_AFFECTATION",
        "keepEmptyChunks": false,
        "limit": 0,
        "separator": "||",
        "outColPrefix": "col_1_",
        "limitOutput": false,
        "target": "JSON"
      },
      "alwaysShowComment": false,
      "selected": true
    },
    {
      "preview": false,
      "metaType": "PROCESSOR",
      "disabled": false,
      "type": "ArrayFold",
      "params": {
        "column": "col_1_"
      },
      "alwaysShowComment": false,
      "selected": true
    },
    {
      "preview": false,
      "metaType": "PROCESSOR",
      "disabled": false,
      "type": "RegexpExtractor",
      "params": {
        "found_col": false,
        "extractAllOccurrences": false,
        "prefix": "",
        "column": "col_1_",
        "pattern": "([0-9]{4}\\pP[0-9]{2}\\pP[0-9]{2})\\s:\\s(\\w{1,10})\\s-\\s(.*)"
      },
      "alwaysShowComment": false,
      "selected": true
    }
  ]
}

 

 

View solution in original post

2 Replies
AlexGo
Dataiker
Dataiker

Hi Romains,

Not sure if this is the best way to do it, but I would do the following:

  1. Split the HISTORIQUE_AFFECTATION column on newline "\n" (you might have to do a replace with something like "||" first if it doesn't recognize it). I would also select 'Output as Array (JSON)'
  2. 'Fold array values' of col_1 to new rows
  3. Extract using regular expression - something like
    ([0-9]{4}\pP[0-9]{2}\pP[0-9]{2})\s:\s(\w{1,10})\s-\s(.*)

 

Here are the steps I used if you want to try copy and paste:

 

{
  "type": "shaker-steps",
  "version": "10.0.4-beta2",
  "steps": [
    {
      "preview": false,
      "metaType": "PROCESSOR",
      "disabled": false,
      "type": "FindReplace",
      "params": {
        "output": "",
        "mapping": [
          {
            "from": "\\n",
            "to": "||"
          },
          {
            "from": "",
            "to": ""
          }
        ],
        "normalization": "EXACT",
        "columns": [
          "HISTORIQUE_AFFECTATION"
        ],
        "appliesTo": "SINGLE_COLUMN",
        "stopAfterFirstMatch": false,
        "matching": "PATTERN"
      },
      "alwaysShowComment": false,
      "selected": true
    },
    {
      "preview": false,
      "designTimeReport": {
        "report": {
          "createdColumnCount": 0
        }
      },
      "metaType": "PROCESSOR",
      "disabled": false,
      "type": "ColumnSplitter",
      "params": {
        "inCol": "HISTORIQUE_AFFECTATION",
        "keepEmptyChunks": false,
        "limit": 0,
        "separator": "||",
        "outColPrefix": "col_1_",
        "limitOutput": false,
        "target": "JSON"
      },
      "alwaysShowComment": false,
      "selected": true
    },
    {
      "preview": false,
      "metaType": "PROCESSOR",
      "disabled": false,
      "type": "ArrayFold",
      "params": {
        "column": "col_1_"
      },
      "alwaysShowComment": false,
      "selected": true
    },
    {
      "preview": false,
      "metaType": "PROCESSOR",
      "disabled": false,
      "type": "RegexpExtractor",
      "params": {
        "found_col": false,
        "extractAllOccurrences": false,
        "prefix": "",
        "column": "col_1_",
        "pattern": "([0-9]{4}\\pP[0-9]{2}\\pP[0-9]{2})\\s:\\s(\\w{1,10})\\s-\\s(.*)"
      },
      "alwaysShowComment": false,
      "selected": true
    }
  ]
}

 

 

Romains
Level 1
Author

Yes it worked with a replace and fold to new rows, thank you !