Create a SQL output dataset with auto increment id column

esakkiraj · April 2021

Hi,

I am trying to create a SQL output dataset with auto increment ID column ( Primary Key ), then using Python recipe which writes to this output dataset excluding the ID column. I could not find any example to modify the manual schema to create Primary key ID column and write to the same. Is there a way to achieve this in DSS ?

TIA.

ATsao · April 2021

Hi esakkiraj,

If the ask is to add an "index" column to the output dataset, then you can certainly handle this through Python. What you would add to do is to make sure that this index column is added as a column into the pandas dataframe itself (after converting the input dataset into a dataframe) and then write to your output dataset using the standard write_with_schema (which will overwrite the schema). For example, you could do something like:

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
input = dataiku.Dataset("input_dataset")
input_df = input.get_dataframe()

# Add index column to dataframe and copy to output
input_df.reset_index(level=0, inplace=True)
output_df = input_df 

# Write recipe outputs
output = dataiku.Dataset("output_dataset")
output.write_with_schema(output_df)

Let me know if that helps!

Thanks,

Andrew

Create a SQL output dataset with auto increment id column

Answers

Categories

Setup Info

Tags