Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Is it possible to split multiple columns in a prepare recipe where the columns have the same prefix and where the content in a column is comma-separated without having to make multiple steps?
I'm not clear that I understand exactly what you are trying to do.
In the examples you have listed it appears that you are trying to split on commas within a column and you want to make those results into an initially unknown number of additional columns. There are several ways to do this kind of thing with DSS. You could look at the split or extract recipes steps.
You then want to be able to repeat those steps across multiple columns that share a column prefix. (My guess is that you eventually plan to have a lot more than 2 columns or the names of the columns are not known while you are creating the recipe.)
The challenge you may be having is that the split and extract recipes steps only work on a single column at a time. And you either don't know the names of the incoming columns or there are a large enough number of columns that hand-selecting each column for spiting would be challenging. Is this a correct intuition on my part?
Do you know how to program in Python? If so, it may be possible to do what you want to do in a python visual recipe step.
Although, I've created a few python recipe steps to do some wonderful things. It is still a bit of a mysterious art to me how to actually create the kind of results you are describing. (Although, I believe it might be doable.)
You can find the documentation for Python Function steps here.
Here is an extended thread on Python Visual recipe steps.
Here is a knowledge base article on the topic.
Here is an example that is similar to your problem set.
Hope this helps a little bit. Can you clarify for others if the description above is what you are trying to do. Then others may be able to jump in here and give a bit more clarity. Please also let us know how you get on with your challenge. Most of us here enjoying solving puzzles like this.
I got hold on an example of the data in mind.
If you look at this example.
For each column "LID_split_" the column should be split based on the comma - separated values in the cell.
Sorry for not providing the example earlier...
Hmmmm..... I'm looking at the data you are showing here. It almost looks like, what you are showing is an attempt upstream of this data to flatten a hierarchical maybe XML data source.
I'm not an expert in this. However, I'm wondering if going back to the original source with something like xpath might get you a cleaner data parsing. Here is some documentation about Dataiku's XML implementation.
Others please feel free to jump in.
Tnx for you valuable input.
I'll look into the links you've send.
The data source is a DB text dump :o, but would have been easier to handle if it indeed had been xml.
Possibly a bit too simplistic thinking but .. as you state the original file is a DB-text dump. If that's a simple, delimited values file the column descriptions will be on top right ? why not work on the column-specs provided there instead of the data itself ?
a rough idea what i have in mind here:
Pretty sure this could be done with some custom coding directly on your DB-dump, but i'm not yet qualified to start sharing code-examples..