Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi everyone, I am trying to create a table with month and cost data (ie Item A costs $50 in Jan 2020), where if there is a null value in a given month it would take the previous month's cost information. Normally, I would use a lag, but that only allows me to reference the previous month's cost. If there are situations where it lags multiple months, I need to figure out how to fill a null value with the last month that had a value. Any ideas? See an example below.
Initial data set
Date: Item: Cost:
Jan 2020 A 10
March 2020 A 11
June 2020 A 12
New data set
Date: Item: Cost:
Jan 2020 A 10
Feb 2020 A 10
March 2020 A 11
April 2020 A 11
May 2020 A 11
June 2020 A 12
Hi, you can use the "Fill empty cells with previous/next value" processor of a prepare recipe. More info here: https://doc.dataiku.com/dss/latest/preparation/processors/up-down-fill.html
Hi, you can use the "Fill empty cells with previous/next value" processor of a prepare recipe. More info here: https://doc.dataiku.com/dss/latest/preparation/processors/up-down-fill.html
How is row order controlled or set with this processor?
... Time Passes ...
As I looked at this it appears that you have to setup a sort recipe prior to a visual recipie with the fill.
I have set it so that it is ordering by item, year and month. I think the fill with previous value should work if the data is ordered correctly. 🙂
Hi, a Sort recipe is the best way to set row orders. In fact, there are two different cases depending on the dataset's type:
The dataset is stored as a file and I am currently using the sort recipe. Will this not apply the sort to the full data set permanently? If not, would I then just do a SQL function to do an order by instead?
Hi,
The sort recipe will always sort it's output dataset. However, for better read/write performance on files datasets, the order won't be preserved by default for datasets located further down in the flow. In order to preserve row order, you can still activate the "Preserve ordering" setting in those datasets (Dataset > Settings > Advanced). NB: as stated in the documentation here (https://doc.dataiku.com/dss/latest/connecting/ordering.html) preserving ordering may decrease performance.
On SQL datasets, you can just set the sort in the same "Preserve ordering" setting so that DSS reads it with the appropriate ORDER BY clause