Want to Stop Rebuilding "Expensive" Parts of your Flow? Explicit Builds are the Answer!READ MORE

Write in specific database from python recipe

Solved!
Florent1
Level 1
Write in specific database from python recipe

Hello,

Does someone have an example of a python recipe for writing a df into a specific database ?

Thank you very much !

Florent

0 Kudos
1 Solution
AlexT
Dataiker
Dataiker

Hi, That seems like a sensible approach, If you defined your project/global variable. This will result in multiple datasets being visible in the flow you can move them to a specific flow zone and even remove and not drop the contents depending on if you need to reuse these in the same flow. 

You can use something like his within a Scenario Python step. This won't work in an actual recipe but would work in scenario or notebook. 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

client = dataiku.api_client()
project = client.get_default_project()

# Read recipe inputs
my_input_ds = dataiku.Dataset("year")
my_dataframe = my_input_ds.get_dataframe()

ts_var = dataiku.get_custom_variables()["timestamp_sql_table"]
dataset_name = "my_dataset" + "_" + ts_var

connection_name = "Postgres-Localhost"

builder = project.new_managed_dataset(dataset_name)
builder.with_store_into(connection_name)
dataset = builder.create()

#
write_ds = dataiku.Dataset(dataset_name)
write_ds.write_with_schema(my_dataframe)

You can also create unmanaged SQL datasets using https://doc.dataiku.com/dss/latest/python-api/datasets-other.html#sql-dataset-programmatic-creation

Let me know if that helps!

View solution in original post

0 Kudos
3 Replies
AlexT
Dataiker
Dataiker

Hi Florent, 

Writing a df into a specific database can best be done via writing your df output to a SQL dataset. The steps would include:

1) Adding a database connection to your SQL database. See supported SQL database here:  : https://doc.dataiku.com/dss/latest/connecting/sql.html 

2) Specifying an output dataset within that connection and you can write the DF to the dataset using write_with_schema() or write_dataframe() https://doc.dataiku.com/dss/latest/code_recipes/python.html#writing-a-pandas-dataframe-in-a-dataset 

If the above is not suitable you can technically use  https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html but this would be done outside of DSS so any credentials would need to be included in the recipe, JDBC drivers handled outside of DSS as well. 

Hope this helps!

0 Kudos
Florent1
Level 1
Author

Hi Alex,

Thank you very much for your response.

Actually what I want to do is save a table in a database with a name that varies (with current date for example and other information). So my idea was to define gloval variables, create a Python recipe and use these global variables and the current date to define the name of the table and save this table to the database.

I don't know how to do that and I don't know if it the best way to do this.

Thank you,

Florent

0 Kudos
AlexT
Dataiker
Dataiker

Hi, That seems like a sensible approach, If you defined your project/global variable. This will result in multiple datasets being visible in the flow you can move them to a specific flow zone and even remove and not drop the contents depending on if you need to reuse these in the same flow. 

You can use something like his within a Scenario Python step. This won't work in an actual recipe but would work in scenario or notebook. 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

client = dataiku.api_client()
project = client.get_default_project()

# Read recipe inputs
my_input_ds = dataiku.Dataset("year")
my_dataframe = my_input_ds.get_dataframe()

ts_var = dataiku.get_custom_variables()["timestamp_sql_table"]
dataset_name = "my_dataset" + "_" + ts_var

connection_name = "Postgres-Localhost"

builder = project.new_managed_dataset(dataset_name)
builder.with_store_into(connection_name)
dataset = builder.create()

#
write_ds = dataiku.Dataset(dataset_name)
write_ds.write_with_schema(my_dataframe)

You can also create unmanaged SQL datasets using https://doc.dataiku.com/dss/latest/python-api/datasets-other.html#sql-dataset-programmatic-creation

Let me know if that helps!

0 Kudos