My dataset gets wiped (seemingly) periodically

Options
Edujaure
Edujaure Registered Posts: 1

Hello, I've been working with Dataiku for a few months, learning all about it. I created a flow to forecast upcoming values, from historical data.

The method I used to import this historical data into Dataiku was: I created a Google BigQuery dataset → I put the SQL query in it → Loaded all the data → Synced it into another dataset → Deleted the sync recipe, so that whenever I run the flow, the query doesn't have to be loaded again as it's historical data that's not going to change and takes long to query.

This flow runs once a week. From what I've read, it doesn't matter if I build a dataset once and never build it again; in theory it should never get wiped. Yet, from what I've checked, when it's been around 5 days since last building the dataset, it goes empty. So the scenario build fails, and I have to make the sync recipe again and recover all the data, which is quite annoying and is manual work, which is exactly what I want to avoid by making a Dataiku Flow.

So, has this happened to anyone else? Does anyone know the fix, or my mistake? Thanks in advance

Operating system used: Windows

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,531 Neuron

    Dataiku itself will never do this. Two possible answers:

    1. Your Dataiku administrator looks at orphan datasets and deletes the data after 5 days to save on cloud costs
    2. Your BigQuery administrator set the table retention period to be 5 days
Setup Info
    Tags
      Help me…