Refresh/Reload Dataset
Hello,
I´m new to Dataiku and looking forward to work with such an amazing tool.
My first question:
Is it possible to manually refresh/reload a dataset based on a PostgresSQL Table?
Many thanks!
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
Yes, the sample is only for Exploring data and building your recipes/models.
If you have a table that was added as Input dataset DSS will read the dataset each type a downstream recipe is run. To rebuild or refresh everything you can simply use a "Recursive Build". https://doc.dataiku.com/dss/latest/flow/building-datasets.html
You can also schedule a scenario to rebuild a dataset that will implicitly read any new data added since the last run in the input dataset.
Depending on your input data you may choose to create a partitioned dataset by "Day" for example. That will be more efficient since it will only build a subset e.g the LAST_DAY instead of having to rebuild the whole dataset each time. For more information please see:
https://doc.dataiku.com/dss/latest/partitions/sql_datasets.html
https://doc.dataiku.com/dss/latest/partitions/index.html
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,211 Dataiker
Welcome to the Dataiku Community!
To answer your question you can manually refresh the sample for a SQL dataset.
In DSS when you add a SQL dataset it will fetch a sample by default first 10,000 rows returned. You can save and refresh this sample as needed by clicking on the dataset and going to the Sample Settings - Save and Refresh Sample.
This also allows you to customize the sample settings by setting filters.
Let me know if this was what you were lookin for?
For recipes if the dataset is SQL dataset then it would not need to be "refreshed", you would simply re-run the recipe and it would use the current data in the table.
Also a good resource to help you better understand SQL datasets would be: https://academy.dataiku.com/path/core-designer/integration-with-sql-databases-1
-
Hello @AlexT
many thanks for the kind words and your answer!
A few questions raised up after your answer.
The sample data shows only the first 10,000 records. But this is just for exploring the data, right? Recipes, Charts etc. are using all the records?
I have a "base table" in PostgresSQL that will be updated every day manually with new data. This is the "base dataset" in Dataiku with followed recipes, statistics etc. How can I tell Dataiku to update "everything"? This should/could be a manual step in the beginning.
Thanks!
-
Great, now I have enough input to work on the next steps.
Many thanks!