How can I build a dataset incrementaly using a plugin?

Solved!
rmnvncnt
Level 3
How can I build a dataset incrementaly using a plugin?

I've written a plugin that connects to an API. This plugin is based on the RaaS example provided here : I provide an argument in the corresponding box, click on "Test and get schema", then the dataset is filled with the data from the API.



How can I do this automatically (using a scenario, for instance), in order to connect to the API every day, pass different arguments, and add data incrementally to the existing dataset? Is it something the plugin should handle or can it be managed within a scenario?



Thanks a lot,



Romain



 

0 Kudos
1 Solution
Alex_Combessie
Dataiker Alumni

Hello,



Plugins can encompass several types of "Components" (see https://doc.dataiku.com/dss/latest/plugins/reference/plugins-components.html). In the case of a plugin with a "Connector" or "Dataset" component, you can do incremental building by using partitioning: partitioning: https://doc.dataiku.com/dss/latest/partitions/index.html



Having said that, you can also achieve this rapidly with a "Custom recipe" component, by using the option "append instead of overwrite"





The recipe would have 1 output and no input. Note that using "append instead of overwrite" is simpler but it may lead to duplicates and wrong data being inserted.



Generally speaking when working with APIs, it is recommended to take into account these cases in your code:



- API network failure



- Wrong or empty data being sent by the API



- Duplicate results from the API



Cheers,



Alex

View solution in original post

2 Replies
Alex_Combessie
Dataiker Alumni

Hello,



Plugins can encompass several types of "Components" (see https://doc.dataiku.com/dss/latest/plugins/reference/plugins-components.html). In the case of a plugin with a "Connector" or "Dataset" component, you can do incremental building by using partitioning: partitioning: https://doc.dataiku.com/dss/latest/partitions/index.html



Having said that, you can also achieve this rapidly with a "Custom recipe" component, by using the option "append instead of overwrite"





The recipe would have 1 output and no input. Note that using "append instead of overwrite" is simpler but it may lead to duplicates and wrong data being inserted.



Generally speaking when working with APIs, it is recommended to take into account these cases in your code:



- API network failure



- Wrong or empty data being sent by the API



- Duplicate results from the API



Cheers,



Alex

rmnvncnt
Level 3
Author
Thanks for your answer! I first tried your solution using a custom recipe, but I found it more convenient to use a custom scenario in the end : this scenario writes into new partitions everyday and report errors such as empty data (as you mentioned).
0 Kudos