Scenario successful, but latest rows in dataset missing -> manual update necessary

cuezumo Registered Posts: 8 ✭✭✭✭

Hey everybody,

I've got a large project, which is updated as needed a couple of times per week with a comprehensive scenario. (Actually it's two projects and two scenarios, drawing a line between raw data & data cleansing, and data analysis.)

The recipes work fine, but they fail to include the latest rows of at least one dataset, hence the final data are not up to date but a few days old. When I trace the problem back, I see that there's one recipe in the project with the raw data, which actually was not updated in the scenario. Mysteriously the resulting dataset is displayed as "updated in the past hour" in the flow. When I run the recipe and all the dependent ones downstream manually, everything works fine and the data are up to date.

Does anybody know what might be the problem?

Additional infos about the problematic recipe: It's a postgresql dataset, which I'm editing with a Prepare Visual Recipe. I'm only parsing the date in three columns, nothing special.

I'd appreciate your help, thanks in advance!

Best, cuezumo

Best Answer

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Answer ✓

    Hi @cuezumo

    What is the Build mode setting for the relevant Build step in your scenario? The default mode of "Build required datasets" doesn't rebuild SQL datasets if those datasets are at the beginning of a flow (or the entire flow). The solution is to set the Build mode to "Force-rebuild dataset and dependencies" which will rebuild all SQL datasets. This may be not at all what is going on in your situation but we have certainly experienced data refreshes not happening when expected due to the wrong Build mode setting. More on this situation in this product idea.



Setup Info
      Help me…