Create a SQL output dataset with auto increment id column

esakkiraj
esakkiraj Registered Posts: 1 ✭✭✭

Hi,

I am trying to create a SQL output dataset with auto increment ID column ( Primary Key ), then using Python recipe which writes to this output dataset excluding the ID column. I could not find any example to modify the manual schema to create Primary key ID column and write to the same. Is there a way to achieve this in DSS ?

TIA.

Tagged:

Answers

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    edited July 17

    Hi esakkiraj,

    If the ask is to add an "index" column to the output dataset, then you can certainly handle this through Python. What you would add to do is to make sure that this index column is added as a column into the pandas dataframe itself (after converting the input dataset into a dataframe) and then write to your output dataset using the standard write_with_schema (which will overwrite the schema). For example, you could do something like:

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    
    # Read recipe inputs
    input = dataiku.Dataset("input_dataset")
    input_df = input.get_dataframe()
    
    # Add index column to dataframe and copy to output
    input_df.reset_index(level=0, inplace=True)
    output_df = input_df 
    
    # Write recipe outputs
    output = dataiku.Dataset("output_dataset")
    output.write_with_schema(output_df)

    Let me know if that helps!

    Thanks,

    Andrew

Setup Info
    Tags
      Help me…