write_to_schema or get_dataframe does not keep the same ordering of the dataset

Options
PapaA
PapaA Registered Posts: 20 ✭✭✭✭

Hi team,

we have noticed that reading & writing to SQL tables with the build in functionality of Dataiku does not preserve the same order. This has implications when the order of the dataset plays a role on modelling development processes as cross validation. Thus, it makes it impossible to replicate resutls when the recipe is rerun.

Is this something known to you?

Is there a way to keep the same order of the dataset along the project ?

Answers

  • Ludovic_Pénet
    Ludovic_Pénet Dataiker Posts: 7 Dataiker
    Options

    Dear PapaA,

    SQL datasets do not preserve the writing order.

    The simplest thing to do if order really matters is to use another kind of dataset preserving the order. It can be an S3 bucket, a file, etc.

    You can check the dataset type supports ordering by using one instance as the output of a "Sort" recipe. DSS warns you when the dataset type is unable to preserve order.

    Best regards,

    Ludovic

Setup Info
    Tags
      Help me…