Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Sorting dataset for pivot

Sorting dataset for pivot
Is there a way to guarantee a dataset is sorted in order to pivot it?

In the past I had to add an additional Python or SQL recipe to sort the data, but I realized that even this is not 100% working anymore. Sometimes after an SQL ORDER BY or a Pandas sort_values the resulting dataset contains random rows that are not sorted, even though 99% of the rows are sorted correctly, e.g. I have this right now:

ID Tag
1 automation
100 automation
101 biotech <--- should not be there!
102 automation
152 automation
153 biotech

Since it doesn't matter whether I sort using Python or SQL, I guess it has something to do with how Dataiku works internally. Is there anything I can do?
0 Kudos
1 Reply
The fact that python or SQL does not manage to sort your dataset has unfortunately nothing to do with how Dataiku works internally, we orchestrate the execution of recipes/queries.

The issue is somewhere else...

In any case, investigating more into this issue would require a dataset extract so that we can try to reproduce the issue.
0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS