How to split a cell in several columns

Options
Romains
Romains Registered Posts: 2 ✭✭✭

Hello,

Ive an Excel data sheet containing HR data. Each line represent a person. One specific cell contain all the jobs that person had in his career. The cell is composed by several lines, each line has this format :
Date of entrance in the job - Job ID - Job name

The number of lines vary from person to person. One can have just one, another one can have 20.

I would like to transform this cell in 3*x columns, given x the max number of line of the person having the maximum number of jobs / lines.

How to do in DSS please?

I attached a data example.

Thank you.


Operating system used: Windows

Best Answer

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
    edited July 17 Answer ✓
    Options

    Hi Romains,

    Not sure if this is the best way to do it, but I would do the following:

    1. Split the HISTORIQUE_AFFECTATION column on newline "\n" (you might have to do a replace with something like "||" first if it doesn't recognize it). I would also select 'Output as Array (JSON)'
    2. 'Fold array values' of col_1 to new rows
    3. Extract using regular expression - something like
      ([0-9]{4}\pP[0-9]{2}\pP[0-9]{2})\s:\s(\w{1,10})\s-\s(.*)

    Here are the steps I used if you want to try copy and paste:

    {
      "type": "shaker-steps",
      "version": "10.0.4-beta2",
      "steps": [
        {
          "preview": false,
          "metaType": "PROCESSOR",
          "disabled": false,
          "type": "FindReplace",
          "params": {
            "output": "",
            "mapping": [
              {
                "from": "\\n",
                "to": "||"
              },
              {
                "from": "",
                "to": ""
              }
            ],
            "normalization": "EXACT",
            "columns": [
              "HISTORIQUE_AFFECTATION"
            ],
            "appliesTo": "SINGLE_COLUMN",
            "stopAfterFirstMatch": false,
            "matching": "PATTERN"
          },
          "alwaysShowComment": false,
          "selected": true
        },
        {
          "preview": false,
          "designTimeReport": {
            "report": {
              "createdColumnCount": 0
            }
          },
          "metaType": "PROCESSOR",
          "disabled": false,
          "type": "ColumnSplitter",
          "params": {
            "inCol": "HISTORIQUE_AFFECTATION",
            "keepEmptyChunks": false,
            "limit": 0,
            "separator": "||",
            "outColPrefix": "col_1_",
            "limitOutput": false,
            "target": "JSON"
          },
          "alwaysShowComment": false,
          "selected": true
        },
        {
          "preview": false,
          "metaType": "PROCESSOR",
          "disabled": false,
          "type": "ArrayFold",
          "params": {
            "column": "col_1_"
          },
          "alwaysShowComment": false,
          "selected": true
        },
        {
          "preview": false,
          "metaType": "PROCESSOR",
          "disabled": false,
          "type": "RegexpExtractor",
          "params": {
            "found_col": false,
            "extractAllOccurrences": false,
            "prefix": "",
            "column": "col_1_",
            "pattern": "([0-9]{4}\\pP[0-9]{2}\\pP[0-9]{2})\\s:\\s(\\w{1,10})\\s-\\s(.*)"
          },
          "alwaysShowComment": false,
          "selected": true
        }
      ]
    }

Answers

Setup Info
    Tags
      Help me…