How to split a cell in several columns

Romains Registered Posts: 2 ✭✭✭


Ive an Excel data sheet containing HR data. Each line represent a person. One specific cell contain all the jobs that person had in his career. The cell is composed by several lines, each line has this format :
Date of entrance in the job - Job ID - Job name

The number of lines vary from person to person. One can have just one, another one can have 20.

I would like to transform this cell in 3*x columns, given x the max number of line of the person having the maximum number of jobs / lines.

How to do in DSS please?

I attached a data example.

Thank you.

Operating system used: Windows

Best Answer

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
    Answer ✓

    Hi Romains,

    Not sure if this is the best way to do it, but I would do the following:

    1. Split the HISTORIQUE_AFFECTATION column on newline "\n" (you might have to do a replace with something like "||" first if it doesn't recognize it). I would also select 'Output as Array (JSON)'
    2. 'Fold array values' of col_1 to new rows
    3. Extract using regular expression - something like

    Here are the steps I used if you want to try copy and paste:

    {"type": "shaker-steps","version": "10.0.4-beta2","steps": [{"preview": false,"metaType": "PROCESSOR","disabled": false,"type": "FindReplace","params": {"output": "","mapping": [{"from": "\\n","to": "||"},{"from": "","to": ""}],"normalization": "EXACT","columns": ["HISTORIQUE_AFFECTATION"],"appliesTo": "SINGLE_COLUMN","stopAfterFirstMatch": false,"matching": "PATTERN"},"alwaysShowComment": false,"selected": true},{"preview": false,"designTimeReport": {"report": {"createdColumnCount": 0}},"metaType": "PROCESSOR","disabled": false,"type": "ColumnSplitter","params": {"inCol": "HISTORIQUE_AFFECTATION","keepEmptyChunks": false,"limit": 0,"separator": "||","outColPrefix": "col_1_","limitOutput": false,"target": "JSON"},"alwaysShowComment": false,"selected": true},{"preview": false,"metaType": "PROCESSOR","disabled": false,"type": "ArrayFold","params": {"column": "col_1_"},"alwaysShowComment": false,"selected": true},{"preview": false,"metaType": "PROCESSOR","disabled": false,"type": "RegexpExtractor","params": {"found_col": false,"extractAllOccurrences": false,"prefix": "","column": "col_1_","pattern": "([0-9]{4}\\pP[0-9]{2}\\pP[0-9]{2})\\s:\\s(\\w{1,10})\\s-\\s(.*)"},"alwaysShowComment": false,"selected": true}]}


Setup Info
      Help me…