Write recipe outputs (rows in initial order)

Options
mouste04
mouste04 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 4

Hello everyone,

I am working on a recipe where I am using "write_with_schema" to write the recipe outputs. I noticed that the order of the rows on the output dataset is different than the expected one. I tried adding the .reset_index(). The indexes exist on the output but again not in the right order. In my case, the order of the rows is crucial.

Is there a way to have an output dataset where the rows are in the correct order using "write_with_schema" or any similar function?

Thank you in advance

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    Options

    Hi @mouste04
    ,

    Can you let us know what type of output dataset type (i.e. Snowflake, S3, filesystem etc) you writing to where you are seeing the output rows in a different order than you are seeing in your input dataframe that is passed to write_with_schema()?

    Thanks,
    Sarina

  • mouste04
    mouste04 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 4
    Options

    Hello Sarina ,

    The type of input and output datasets is Snowflake. The input and output datasets are different. From the input, a linkage matrix is calculated and this is the output. Currently, I have managed to create an index for each row and then order the output dataset to proceed, but I was wandering if there was a better way to do that.

    Thanks.

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    Options

    Hi @mouste04
    ,

    I see! For SQL datasets, the way that they are read in depends on the following Advanced dataset setting:

    Screenshot 2023-06-14 at 5.43.58 PM.png

    This can be applied to the input and output datasets. Does this help?

    Thanks,
    Sarina

  • mouste04
    mouste04 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 4
    Options

    Thank you, this is helpful!

    Though, I was wandering if there is a more automated way to do that. Something that can be done in the backend or some function I can use instead of "write_with_schema" that automatically does that.

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    edited July 17
    Options

    Hi @mouste04
    ,

    You could set this field automatically from the API. However, setting this at the dataset settings is the only way to set the read ordering for the dataset. If you wanted to do so from the API you could do something like this:

    import dataiku
    
    client = dataiku.api_client()
    project = client.get_project('PROJECT_KEY')
    dataset = project.get_dataset('DATASET')
    
    settings = dataset.get_settings()
    raw_settings = settings.get_raw()
    
    raw_settings['readWriteOptions']['defaultReadOrdering']['enabled'] = True
    raw_settings['readWriteOptions']['defaultReadOrdering']['rules'] = [{'columnName': 'COLUMN_VALUE', 'asc': True}]
    
    settings.save()


    Thanks,
    Sarina

Setup Info
    Tags
      Help me…