Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am attempting to get the below input data into the following output format:
(row1) Food type | Cost |
(row2) Dog Food Section Header | null |
(row3) Dog Food 1 | $10 |
(row4) Dog Food 2 | $12 |
(row5) Dog Food 3 | $14 |
(row6) Cat Food Section Header | null |
(row7) Cat Food 1 | $11 |
(row8) Cat Food 2 | $13 |
(row9) Cat Food 3 | $15 |
(row1) Animal Type | Food type | Cost |
(row2) Dog | Dog Food 1 | $10 |
(row3) Dog | Dog Food 2 | $12 |
(row4) Dog | Dog Food 3 | $14 |
(row5) Cat | Cat Food 1 | $11 |
(row6) Cat | Cat Food 2 | $13 |
(row7) Cat | Cat Food 3 | $15 |
Each food type is its own row. I looked throughout the community but could not find an answer. Thanks for your help - new to Dataiku.
Operating system used: Windows
Welcome to the Dataiku Community.
In looking at your INPUT table I see some rows with 1 column and some with 2.
Rows 1,2,3, 5,6,7 look to have the 2nd and third columns of your output data.
In a Visual recipe, you could filter your rows to only rows that have the second column filled in on your sample input data.
That said I'm not clear what indicates the first column of your output set. Is it:
Depending on the answer to this question. Your may first need to sort out the values for your first column.
In the first case either split the first column or regex, of formula with a search could pull the words Dog and Cat out of your rows with data.
In the second case:
Hope this helps just a bit. Let us know how you are getting on with sorting this data.
Thanks for the quick and detailed response. I clarified my original post, but based on your description I perceive it is the second scenario you described. I want to remove "Dog/Cat Food Section Header" and make it a column so that it becomes a column a user can filter on in the output (Excel).
I also perceive the function I was looking for was the "Fill empty cells with previous/next value" function you mentioned. Thanks. I will try it out.
I have a follow up that might be clearer to understand. In the "Example problem" screenshot, I want to copy "Production" in the new energy type down for all of the "TRUE" values until it hits "Imports". Then, I want it to copy down "Imports" for all of the "TRUE" values until it hits "Exports", etc. In short, I am attempting to create a summarized energy type field to make it easier to filter on.
Is there a way to do this using a visual recipe (i.e. no python code)?
If helpful, I included my original formula to create the new energy type column.
I’m writing from an iPad. I don’t have assess to DSS right at the moment. I’m doing this from memory so I may get a detail or two off by a little bit.
You are doing much of what I would do. In general, I would look to see if the energy production column is empty, and then grab what is in energy type field and put it into a new column as you are doing. There may be an isBlank function that will help with this process.
Then rather than putting True in the new column. I’d likely try to leave the column blank. I suspect that the fill down feature expects blank cells in the column to fill down. You may have to use the special value null in your new column in order to make this work. When this is working you should see a blank set of rows in your new column. Separated by your groupings titles. Then I’d run the fill down process as we have discussed.
Some other hints that might be helpful. DSS formula language is case sensitive. (This always gets in my way.)
There is also a display feature in DSS that will show highlighted space or tab characters. These hidden “white spaces” characters can cause problems with these tasks on messy data.
Alternately, you could use the formula you already have and add a step to replace True with an empty string “” or null .
Then you will likely to use the formula step that removes rows with missing values in the energy production column.
A slightly different approach can be :
The given formula for checking if “Energy” is mentioned in “Energy description” can be added to the mentioned formula-step as these can be nested.
EDIT : link to the documentation on formula language