Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I have added the Dataset component in the plugin in Dataiku. It fetches the data using the python api for the last 30 minutes from a third-party application.
Idea is to add this Dataset in the flow and refer to it from Plotly Visualisation in Dash type of Code WebApp and Visualisation should always show the latest data including insertions on the third-party app side.
To achieve this I have done the following so far
1. Created a Dataset Plugin and added in the flow. This is the only orphan node in the flow.
2. Have added a Scenario that triggers every 10 mins to build the above dataset.
3. Created a Dash Code Web app and added code to visualize the above dataset on Plotly.
With Step 2 I assume that every 10 mins Scenario will trigger and will fetch the newly added data from an upstream data source using the generate_rows method call present in the custom dataset plugin python code and remove the old data that is prior to 30 mins. Is this understanding correct?
I tried my best to understand this from the resources available online but was not able clear my doubts.
Let me know if any clarifications are required!
Operating system used: Windows
Indeed, you can use the dataset component plugin to retrieve data from an external API.
To have "new data", you have to build the dataset. I put "new data", because:
1. your component is responsible for acquiring new data (you have to check if it is the case).
2. Dataiku will only build a dataset if there is a need to. So it would be best if you "forced build" this dataset to be sure that Dataiku will build it.
I think there is no need for more clarification, as you understand the processing well. Let me know if you have specific questions.