Write recipe outputs (rows in initial order)
Hello everyone,
I am working on a recipe where I am using "write_with_schema" to write the recipe outputs. I noticed that the order of the rows on the output dataset is different than the expected one. I tried adding the .reset_index(). The indexes exist on the output but again not in the right order. In my case, the order of the rows is crucial.
Is there a way to have an output dataset where the rows are in the correct order using "write_with_schema" or any similar function?
Thank you in advance
Answers
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @mouste04
,
Can you let us know what type of output dataset type (i.e. Snowflake, S3, filesystem etc) you writing to where you are seeing the output rows in a different order than you are seeing in your input dataframe that is passed to write_with_schema()?Thanks,
Sarina -
Hello Sarina ,
The type of input and output datasets is Snowflake. The input and output datasets are different. From the input, a linkage matrix is calculated and this is the output. Currently, I have managed to create an index for each row and then order the output dataset to proceed, but I was wandering if there was a better way to do that.
Thanks.
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @mouste04
,
I see! For SQL datasets, the way that they are read in depends on the following Advanced dataset setting:This can be applied to the input and output datasets. Does this help?
Thanks,
Sarina -
Thank you, this is helpful!
Though, I was wandering if there is a more automated way to do that. Something that can be done in the backend or some function I can use instead of "write_with_schema" that automatically does that.
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @mouste04
,
You could set this field automatically from the API. However, setting this at the dataset settings is the only way to set the read ordering for the dataset. If you wanted to do so from the API you could do something like this:import dataiku client = dataiku.api_client() project = client.get_project('PROJECT_KEY') dataset = project.get_dataset('DATASET') settings = dataset.get_settings() raw_settings = settings.get_raw() raw_settings['readWriteOptions']['defaultReadOrdering']['enabled'] = True raw_settings['readWriteOptions']['defaultReadOrdering']['rules'] = [{'columnName': 'COLUMN_VALUE', 'asc': True}] settings.save()
Thanks,
Sarina