Append instead of Overwrite dataset - API
GS
Registered Posts: 5 ✭✭✭✭
Hello,
we have created a webapp that is connected to an API model doing speech classification.
The user access the webapp, record his speech, and the model gives as an answer the 3 classes with the best fit.
The user can select the one that fits better what he said, and we want to store this result in dataiku for later analysis.
Several user will be using it in parallel so we have many results to store, and we need to collect many of them
Now it seems the API we are using to write those results only overwrite and do not append the result.
Could you please advice on how to do?
Thanks
we have created a webapp that is connected to an API model doing speech classification.
The user access the webapp, record his speech, and the model gives as an answer the 3 classes with the best fit.
The user can select the one that fits better what he said, and we want to store this result in dataiku for later analysis.
Several user will be using it in parallel so we have many results to store, and we need to collect many of them
Now it seems the API we are using to write those results only overwrite and do not append the result.
Could you please advice on how to do?
Thanks
Answers
-
Hi,
You can try using partitioning on that dataset, to write to a new partition at every new session.
-
Thanks Adrien,
Yes but that would result in having one record per partition , which doesn't sound very efficient.
Any other option?
The perfect solution would be if we could append a new line every time there is a new session. Is that possible?
Thanks -
That is not possible through the API AFAIK, you'd have to read/append/write, but then you don't have any guarantee of concurrency.
If you're prepared to complicate a bit your flow to get everything in one partition / unpartitioned dataset, you could add in a partitioned dataset, then have a scheduled scenario that runs every day and syncs this partitioned dataset into an unpartitioned one. Either 1. re-syncs all partitions in normal (overwrite mode) or 2. syncs all partition in append mode and then removes those input partitions. -
To add to that, if you partition a SQL dataset, it's implemented as an additional column. So you actually have a partitioned dataset but an unpartitioned, consolidated SQL table. You can even define another unpartitioned dataset on it but a/ should set it as read-only to prevent and b/ lose the logical connection in the flow of how it's built.
-
Thanks I will try the partitioning with Append Sync