Append instead of Overwrite dataset - API

GS
Level 2
Append instead of Overwrite dataset - API
Hello,

we have created a webapp that is connected to an API model doing speech classification.

The user access the webapp, record his speech, and the model gives as an answer the 3 classes with the best fit.
The user can select the one that fits better what he said, and we want to store this result in dataiku for later analysis.
Several user will be using it in parallel so we have many results to store, and we need to collect many of them

Now it seems the API we are using to write those results only overwrite and do not append the result.
Could you please advice on how to do?

Thanks
0 Kudos
5 Replies
AdrienL
Dataiker

Hi,



You can try using partitioning on that dataset, to write to a new partition at every new session.

GS
Level 2
Author
Thanks Adrien,
Yes but that would result in having one record per partition , which doesn't sound very efficient.

Any other option?
The perfect solution would be if we could append a new line every time there is a new session. Is that possible?

Thanks
0 Kudos
AdrienL
Dataiker
That is not possible through the API AFAIK, you'd have to read/append/write, but then you don't have any guarantee of concurrency.

If you're prepared to complicate a bit your flow to get everything in one partition / unpartitioned dataset, you could add in a partitioned dataset, then have a scheduled scenario that runs every day and syncs this partitioned dataset into an unpartitioned one. Either 1. re-syncs all partitions in normal (overwrite mode) or 2. syncs all partition in append mode and then removes those input partitions.
0 Kudos
AdrienL
Dataiker
To add to that, if you partition a SQL dataset, it's implemented as an additional column. So you actually have a partitioned dataset but an unpartitioned, consolidated SQL table. You can even define another unpartitioned dataset on it but a/ should set it as read-only to prevent and b/ lose the logical connection in the flow of how it's built.
0 Kudos
GS
Level 2
Author
Thanks I will try the partitioning with Append Sync
0 Kudos

Labels

?
Labels (3)
A banner prompting to get Dataiku