How to get newly appended rows in sync recipe ?

raphaffou
raphaffou Registered Posts: 2 ✭✭✭

Hello, this is not a huge problem but I'm sure there is a way to do this properly.

I'm syncing rows that I append to an SQL table. The "id" column of this table is auto incremented, and thus before syncing, the corresponding row is set to null (in order for it the have a correct id once in the SQL table).

In the case "id" is the only unique identifier, how do I know rows I've just inserted ?

Do I have to fill the id column manually ??? Is there a simple "hacky" way of doing things (getting the last n rows of the table...) or is there an option for this ?

Thank you

Best Answers

  • Grixis
    Grixis PartnerApplicant, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 80 ✭✭✭✭✭
    Answer ✓

    Hey,

    I don't think there is a way to manage it by design with dataiku visual components or something from your 'post-sync' dataset settings.

    For me, the solution could be a custom code script for select the max/last id and descending your dataset. to avoid adding a step for this, perhaps you could do an SQL step directly to SYNC your output table.

  • raphaffou
    raphaffou Registered Posts: 2 ✭✭✭
    edited August 21 Answer ✓

    For anyone reading this and having a similar problem, I just made a python recipe as suggested by this comment. It counts the rows before syncing and outputs the last n rows of the SQL table sorted on id. Pretty inefficient I think but it works.

    However, you still lose some information, for example for two rows like :

    id | date | description (table being talked about here)

    101 | 11/02/24 | empty

    102 | 11/02/24 | empty

    id | name (this one is only here to give some context)

    101 | a

    102 | b

    It becomes clear that these two ids are interchangeable in the upper table as long as you're coherent.

    Thank you all for your answers

Answers

  • LouisDHulst
    LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 54 Neuron

    Hey @raphaffou ,

    Is adding a column in your SQL with the timestamp of the sync possible? That way you can track when data was added. You could do it in a prepare recipe before the Sync, and would have to alter your SQL table.

    You can also try using Metrics. If you calculate the max of "id" and set the metric to be computed each time the SQL table is built, you'll have an a lower and upper bound for "id" at each date. It's a bit more hacky but should work.

Setup Info
    Tags
      Help me…