How to split a cell in several columns
Hello,
Ive an Excel data sheet containing HR data. Each line represent a person. One specific cell contain all the jobs that person had in his career. The cell is composed by several lines, each line has this format :
Date of entrance in the job - Job ID - Job name
The number of lines vary from person to person. One can have just one, another one can have 20.
I would like to transform this cell in 3*x columns, given x the max number of line of the person having the maximum number of jobs / lines.
How to do in DSS please?
I attached a data example.
Thank you.
Operating system used: Windows
Best Answer
-
AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
Hi Romains,
Not sure if this is the best way to do it, but I would do the following:
- Split the HISTORIQUE_AFFECTATION column on newline "\n" (you might have to do a replace with something like "||" first if it doesn't recognize it). I would also select 'Output as Array (JSON)'
- 'Fold array values' of col_1 to new rows
- Extract using regular expression - something like
([0-9]{4}\pP[0-9]{2}\pP[0-9]{2})\s:\s(\w{1,10})\s-\s(.*)
Here are the steps I used if you want to try copy and paste:
{ "type": "shaker-steps", "version": "10.0.4-beta2", "steps": [ { "preview": false, "metaType": "PROCESSOR", "disabled": false, "type": "FindReplace", "params": { "output": "", "mapping": [ { "from": "\\n", "to": "||" }, { "from": "", "to": "" } ], "normalization": "EXACT", "columns": [ "HISTORIQUE_AFFECTATION" ], "appliesTo": "SINGLE_COLUMN", "stopAfterFirstMatch": false, "matching": "PATTERN" }, "alwaysShowComment": false, "selected": true }, { "preview": false, "designTimeReport": { "report": { "createdColumnCount": 0 } }, "metaType": "PROCESSOR", "disabled": false, "type": "ColumnSplitter", "params": { "inCol": "HISTORIQUE_AFFECTATION", "keepEmptyChunks": false, "limit": 0, "separator": "||", "outColPrefix": "col_1_", "limitOutput": false, "target": "JSON" }, "alwaysShowComment": false, "selected": true }, { "preview": false, "metaType": "PROCESSOR", "disabled": false, "type": "ArrayFold", "params": { "column": "col_1_" }, "alwaysShowComment": false, "selected": true }, { "preview": false, "metaType": "PROCESSOR", "disabled": false, "type": "RegexpExtractor", "params": { "found_col": false, "extractAllOccurrences": false, "prefix": "", "column": "col_1_", "pattern": "([0-9]{4}\\pP[0-9]{2}\\pP[0-9]{2})\\s:\\s(\\w{1,10})\\s-\\s(.*)" }, "alwaysShowComment": false, "selected": true } ] }
Answers
-
Yes it worked with a replace and fold to new rows, thank you !