Do you know the History of Data Science? READ MORE

Scenario successful, but latest rows in dataset missing -> manual update necessary

Solved!
cuezumo
Level 2
Scenario successful, but latest rows in dataset missing -> manual update necessary

Hey everybody, 

I've got a large project, which is updated as needed a couple of times per week with a comprehensive scenario. (Actually it's two projects and two scenarios, drawing a line between raw data & data cleansing, and data analysis.)

The recipes work fine, but they fail to include the latest rows of at least one dataset, hence the final data are not up to date but a few days old. When I trace the problem back, I see that there's one recipe in the project with the raw data, which actually was not updated in the scenario. Mysteriously the resulting dataset is displayed as "updated in the past hour" in the flow. When I run the recipe and all the dependent ones downstream manually, everything works fine and the data are up to date. 

Does anybody know what might be the problem? 

Additional infos about the problematic recipe: It's a postgresql dataset, which I'm editing with a Prepare Visual Recipe. I'm only parsing the date in three columns, nothing special. 

I'd appreciate your help, thanks in advance! 

Best, cuezumo 

0 Kudos
1 Solution
Marlan
Neuron
Neuron

Hi @cuezumo,

What is the Build mode setting for the relevant Build step in your scenario? The default mode of "Build required datasets" doesn't rebuild SQL datasets if those datasets are at the beginning of a flow (or the entire flow). The solution is to set the Build mode to "Force-rebuild dataset and dependencies" which will rebuild all SQL datasets. This may be not at all what is going on in your situation but we have certainly experienced data refreshes not happening when expected due to the wrong Build mode setting. More on this situation in this product idea.

Marlan

View solution in original post

2 Replies
Marlan
Neuron
Neuron

Hi @cuezumo,

What is the Build mode setting for the relevant Build step in your scenario? The default mode of "Build required datasets" doesn't rebuild SQL datasets if those datasets are at the beginning of a flow (or the entire flow). The solution is to set the Build mode to "Force-rebuild dataset and dependencies" which will rebuild all SQL datasets. This may be not at all what is going on in your situation but we have certainly experienced data refreshes not happening when expected due to the wrong Build mode setting. More on this situation in this product idea.

Marlan

View solution in original post

cuezumo
Level 2
Author

HI Marlan,

wow, that was quick and efficient, solved my problem entirely! Thanks a lot, I appreciate it.

Best, Richard 

Labels (3)
A banner prompting to get Dataiku DSS