Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
Does someone have an example of a python recipe for writing a df into a specific database ?
Thank you very much !
Florent
Hi, That seems like a sensible approach, If you defined your project/global variable. This will result in multiple datasets being visible in the flow you can move them to a specific flow zone and even remove and not drop the contents depending on if you need to reuse these in the same flow.
You can use something like his within a Scenario Python step. This won't work in an actual recipe but would work in scenario or notebook.
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
client = dataiku.api_client()
project = client.get_default_project()
# Read recipe inputs
my_input_ds = dataiku.Dataset("year")
my_dataframe = my_input_ds.get_dataframe()
ts_var = dataiku.get_custom_variables()["timestamp_sql_table"]
dataset_name = "my_dataset" + "_" + ts_var
connection_name = "Postgres-Localhost"
builder = project.new_managed_dataset(dataset_name)
builder.with_store_into(connection_name)
dataset = builder.create()
#
write_ds = dataiku.Dataset(dataset_name)
write_ds.write_with_schema(my_dataframe)
You can also create unmanaged SQL datasets using https://doc.dataiku.com/dss/latest/python-api/datasets-other.html#sql-dataset-programmatic-creation
Let me know if that helps!
Hi Florent,
Writing a df into a specific database can best be done via writing your df output to a SQL dataset. The steps would include:
1) Adding a database connection to your SQL database. See supported SQL database here: : https://doc.dataiku.com/dss/latest/connecting/sql.html
2) Specifying an output dataset within that connection and you can write the DF to the dataset using write_with_schema() or write_dataframe() https://doc.dataiku.com/dss/latest/code_recipes/python.html#writing-a-pandas-dataframe-in-a-dataset
If the above is not suitable you can technically use https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html but this would be done outside of DSS so any credentials would need to be included in the recipe, JDBC drivers handled outside of DSS as well.
Hope this helps!
Hi Alex,
Thank you very much for your response.
Actually what I want to do is save a table in a database with a name that varies (with current date for example and other information). So my idea was to define gloval variables, create a Python recipe and use these global variables and the current date to define the name of the table and save this table to the database.
I don't know how to do that and I don't know if it the best way to do this.
Thank you,
Florent
Hi, That seems like a sensible approach, If you defined your project/global variable. This will result in multiple datasets being visible in the flow you can move them to a specific flow zone and even remove and not drop the contents depending on if you need to reuse these in the same flow.
You can use something like his within a Scenario Python step. This won't work in an actual recipe but would work in scenario or notebook.
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
client = dataiku.api_client()
project = client.get_default_project()
# Read recipe inputs
my_input_ds = dataiku.Dataset("year")
my_dataframe = my_input_ds.get_dataframe()
ts_var = dataiku.get_custom_variables()["timestamp_sql_table"]
dataset_name = "my_dataset" + "_" + ts_var
connection_name = "Postgres-Localhost"
builder = project.new_managed_dataset(dataset_name)
builder.with_store_into(connection_name)
dataset = builder.create()
#
write_ds = dataiku.Dataset(dataset_name)
write_ds.write_with_schema(my_dataframe)
You can also create unmanaged SQL datasets using https://doc.dataiku.com/dss/latest/python-api/datasets-other.html#sql-dataset-programmatic-creation
Let me know if that helps!