write_to_schema or get_dataframe does not keep the same ordering of the dataset
Hi team,
we have noticed that reading & writing to SQL tables with the build in functionality of Dataiku does not preserve the same order. This has implications when the order of the dataset plays a role on modelling development processes as cross validation. Thus, it makes it impossible to replicate resutls when the recipe is rerun.
Is this something known to you?
Is there a way to keep the same order of the dataset along the project ?
Answers
-
Dear PapaA,
SQL datasets do not preserve the writing order.
The simplest thing to do if order really matters is to use another kind of dataset preserving the order. It can be an S3 bucket, a file, etc.
You can check the dataset type supports ordering by using one instance as the output of a "Sort" recipe. DSS warns you when the dataset type is unable to preserve order.
Best regards,
Ludovic