Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Create a SQL output dataset with auto increment id column

esakkiraj
Level 1
Create a SQL output dataset with auto increment id column

Hi,

 

I am trying to create a SQL output dataset with auto increment ID column ( Primary Key ), then using Python recipe which writes to this output dataset excluding the ID column. I could not find any example to modify the manual schema to create Primary key ID column and write to the same. Is there a way to achieve this in DSS ?

 

TIA.

0 Kudos
1 Reply
ATsao
Dataiker
Dataiker

Hi esakkiraj,

If the ask is to add an "index" column to the output dataset, then you can certainly handle this through Python. What you would add to do is to make sure that this index column is added as a column into the pandas dataframe itself (after converting the input dataset into a dataframe) and then write to your output dataset using the standard write_with_schema (which will overwrite the schema). For example, you could do something like:

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
input = dataiku.Dataset("input_dataset")
input_df = input.get_dataframe()

# Add index column to dataframe and copy to output
input_df.reset_index(level=0, inplace=True)
output_df = input_df 

# Write recipe outputs
output = dataiku.Dataset("output_dataset")
output.write_with_schema(output_df)

Let me know if that helps!

Thanks,

Andrew

0 Kudos
Labels (1)
A banner prompting to get Dataiku DSS