Refresh/Reload Dataset

Solved!
m_sch
Level 3
Refresh/Reload Dataset

Hello,

 

Iยดm new to Dataiku and looking forward to work with such an amazing tool.

 

My first question:

 

Is it possible to manually refresh/reload a dataset based on a PostgresSQL Table?

 

Many thanks!

0 Kudos
1 Solution
AlexT
Dataiker

Yes, the sample is only for Exploring data and building your recipes/models.

If you have a table that was added as Input dataset DSS will read the dataset each type a downstream recipe is run. To rebuild or refresh everything you can simply use a "Recursive Build". https://doc.dataiku.com/dss/latest/flow/building-datasets.html

You can also schedule a scenario to rebuild a dataset that will implicitly read any new data added since the last run in the input dataset. 

Depending on your input data you may choose to create a partitioned dataset by "Day" for example. That will be more efficient since it will only build a subset e.g the LAST_DAY instead of having to rebuild the whole dataset each time. For more information please see:

https://doc.dataiku.com/dss/latest/partitions/sql_datasets.html

https://doc.dataiku.com/dss/latest/partitions/index.html 

 

 

 

 

View solution in original post

0 Kudos
4 Replies
AlexT
Dataiker

Welcome to the Dataiku Community!

To answer your question you can manually refresh the sample for a SQL dataset.

In DSS when you add a SQL dataset  it will fetch a sample by default first 10,000 rows returned. You can save and refresh this sample as needed by clicking on the dataset and going to the Sample Settings - Save and Refresh Sample.

This also allows you to customize the sample settings by setting filters. Screenshot 2021-06-14 at 14.13.49.png

Let me know if this was what you were lookin for?

For recipes if the dataset is SQL dataset then it would not need to be "refreshed", you would simply re-run the recipe and it would use the current data in the table.

Also a good resource to help you better understand SQL datasets would be: https://academy.dataiku.com/path/core-designer/integration-with-sql-databases-1 

 

0 Kudos
m_sch
Level 3
Author

Hello @AlexT 

many thanks for the kind words and your answer!

A few questions raised up after your answer.

The sample data shows only the first 10,000 records. But this is just for exploring the data, right? Recipes, Charts etc. are using all the records?

I have a "base table" in PostgresSQL that will be updated every day manually with new data. This is the "base dataset" in Dataiku with followed recipes, statistics etc. How can I tell Dataiku to update "everything"? This should/could be a manual step in the beginning.

Thanks!

 

 

 

0 Kudos
AlexT
Dataiker

Yes, the sample is only for Exploring data and building your recipes/models.

If you have a table that was added as Input dataset DSS will read the dataset each type a downstream recipe is run. To rebuild or refresh everything you can simply use a "Recursive Build". https://doc.dataiku.com/dss/latest/flow/building-datasets.html

You can also schedule a scenario to rebuild a dataset that will implicitly read any new data added since the last run in the input dataset. 

Depending on your input data you may choose to create a partitioned dataset by "Day" for example. That will be more efficient since it will only build a subset e.g the LAST_DAY instead of having to rebuild the whole dataset each time. For more information please see:

https://doc.dataiku.com/dss/latest/partitions/sql_datasets.html

https://doc.dataiku.com/dss/latest/partitions/index.html 

 

 

 

 

0 Kudos
m_sch
Level 3
Author

Great, now I have enough input to work on the next steps.

 

Many thanks!

0 Kudos