Refresh files dataset
I have many datasets from xlsx files and i want to listen on these files if any updated values or new added rows, Or i want to refresh the data sources every period of time to get the updated sheet.
How can i do that?
Answers
-
tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
@mmamdouh
First off, welcome to the community. What you are talking about is exactly what scenarios are designed to do. (performing a set of steps when a certain triggering action occurs). I'm presuming that your excel files exist in some location that Dataiku has access to and that you have already created a DSS dataset on top of those files. If that is the case you have some options.If you are ok to rebuild your dataset on a schedule, you can create a time-based trigger for your scenario (every X min, hours, days) and then add a step to the recipe to to rebuild your dataset. The scenario will trigger on the specified time interval and build your dataset.
If you need to rebuild your dataset only when a change in your excel files is detected, it will be more involved. I'm not of any way to do this out of the box. I think you will probably have to write a custom trigger in Python that looks at those files. This trigger would then check against your files and when it detected a "change" (you'd be responsible for defining that logic) you could trigger the scenario to rebuild your dataset.
-
Thanks @tim-wright
, exactly i want to create a time-based trigger, how can i do this? -
tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
@mmamdouh
You will need to set up a scenario. To do this, from your flow, select the purple triangle (thats what it looks like in 8.0.0) in the top ribbon bar. This should dropdown a selectable list - which includes "Scenarios". Select that from the menu, then follow these steps (generally).- Create a New Scenario.
- Within that scenario, define a "time based trigger" - This will define when the scenario will run
- Define Steps that are executed when your scenario is triggered. (For you, this will probably be to select a "build/train" step and make sure to indicate whichever dataset you want rebuilt.)
You can find more specifics in the the documentation here: https://doc.dataiku.com/dss/latest/scenarios/index.html