Identical timestamp on all rows

johntarr
johntarr Registered Posts: 7 ✭✭✭✭

I've got a prep recipe that appends to its output dataset.

I want to add a timestamp to the records each time new rows are appended.

I've tried using now(), but that results in a slightly different timestamp on each row instead of the same timestamp for all records.

Does anyone know how to get the exact same timestamp on all rows?

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron
    edited July 17

    Hi @johntarr
    ,

    I can reproduce the behavior you are seeing in the DSS 10.0.6, when using a "prepare" recipe with the local stream processor. This is not the behavior one would expect!

    Apparently, the "now" is calculated for each row individually, and we are seeing the delay in the calculation between rows or rows batches.

    I was wondering if this kind of behavior could be replicated in other cases, like when using python and pandas... and voila, you can also see it in some cases. Doing this:

    df['now'] = pd.Timestamp.now()

    produces the behavior one usually expect, where a new row is added to a dataframe with a constant value for "now", as one can veryify with "df.now.unique":

    array(['2022-06-10T19:07:34.906616000'], dtype='datetime64[ns]')

    However, using this other method:

    df['now'] = df.apply(lambda x: pd.Timestamp.now(), axis=1)

    behaves in the same way as the visual recipe in dataiku! And when checking with unique, I have a different timestamp for all the rows.

    array(['2022-06-10T19:09:53.711466000', '2022-06-10T19:09:53.711627000',
           '2022-06-10T19:09:53.711638000', ...,
           '2022-06-10T19:09:54.276323000', '2022-06-10T19:09:54.276329000',
           '2022-06-10T19:09:54.276335000'], dtype='datetime64[ns]')

    As a "coder", this is something that perhaps one should know, specially if nanoseconds are your thing, and it is kind of obvious when you understand the way that ".apply" works. But for a "visual" user, this is not at all the expected behavior.

    Perhaps this is a "bug" to report?

    Hope this helps, even when I can't provide you a workaround using the Prepare recipe.

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    Edited the answer after having checked since when proposed solution is available (DSS 10.0.4)

    Hello John,

    As Ignacio mentioned, DSS local stream engine will process row one by one, and call the "now()" function for each row, hence giving a slightly different value each time.

    What you want is to retrieve a global information, i.e. the build date of your dataset. Since DSS 10.0.4, it is accessible through the "Enrich record with build information" step. You can precise a "Build date column" that will be unique for each run and correspond to the build date.

    Then you can easily extract the timestamp from this date column.

    Hope this helps,

    Best,

    Nicolas Servel

    PS: if you were to run your prepare recipe with the SQL engine (meaning that your input/output is SQL, and all the steps are SQL-compatible), your solution with "now()" would work, because in that case, the now will only be evaluated once in a SQL query.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron

    Thanks for proposed solution @Nicolas_Servel
    . And for the tip about using a SQL engine instead... I was going to test it myself, but you provided the answer first

  • johntarr
    johntarr Registered Posts: 7 ✭✭✭✭

    Appreciate the info on the enrich build info, but the SQL piece appears to be incorrect. I was getting different results for now(), even with the SQL engine.

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron

    Interesting! Maybe it depends on what SQL database is being used? (Postgresql, oracle, mysql, etc.?)

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    Hello, would you be able to share the SQL code generated for your prepare recipe so that I can verify on my side what is going on ?

    You can find it by clicking on "View query" above the run button of the prepare recipe.

    Best,

    Nicolas

Setup Info
    Tags
      Help me…