Sorting dataset for pivot
UserBird
Dataiker, Alpha Tester Posts: 535 Dataiker
Is there a way to guarantee a dataset is sorted in order to pivot it?
In the past I had to add an additional Python or SQL recipe to sort the data, but I realized that even this is not 100% working anymore. Sometimes after an SQL ORDER BY or a Pandas sort_values the resulting dataset contains random rows that are not sorted, even though 99% of the rows are sorted correctly, e.g. I have this right now:
ID Tag
1 automation
..
100 automation
101 biotech <--- should not be there!<BR />102 automation
...
152 automation
153 biotech
....
Since it doesn't matter whether I sort using Python or SQL, I guess it has something to do with how Dataiku works internally. Is there anything I can do?
In the past I had to add an additional Python or SQL recipe to sort the data, but I realized that even this is not 100% working anymore. Sometimes after an SQL ORDER BY or a Pandas sort_values the resulting dataset contains random rows that are not sorted, even though 99% of the rows are sorted correctly, e.g. I have this right now:
ID Tag
1 automation
..
100 automation
101 biotech <--- should not be there!<BR />102 automation
...
152 automation
153 biotech
....
Since it doesn't matter whether I sort using Python or SQL, I guess it has something to do with how Dataiku works internally. Is there anything I can do?
Tagged:
Answers
-
The fact that python or SQL does not manage to sort your dataset has unfortunately nothing to do with how Dataiku works internally, we orchestrate the execution of recipes/queries.
The issue is somewhere else...
In any case, investigating more into this issue would require a dataset extract so that we can try to reproduce the issue.