Write in specific database from python recipe

Hello,
Does someone have an example of a python recipe for writing a df into a specific database ?
Thank you very much !
Florent
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,258 Dataiker
Hi, That seems like a sensible approach, If you defined your project/global variable. This will result in multiple datasets being visible in the flow you can move them to a specific flow zone and even remove and not drop the contents depending on if you need to reuse these in the same flow.
You can use something like his within a Scenario Python step. This won't work in an actual recipe but would work in scenario or notebook.
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu client = dataiku.api_client() project = client.get_default_project() # Read recipe inputs my_input_ds = dataiku.Dataset("year") my_dataframe = my_input_ds.get_dataframe() ts_var = dataiku.get_custom_variables()["timestamp_sql_table"] dataset_name = "my_dataset" + "_" + ts_var connection_name = "Postgres-Localhost" builder = project.new_managed_dataset(dataset_name) builder.with_store_into(connection_name) dataset = builder.create() # write_ds = dataiku.Dataset(dataset_name) write_ds.write_with_schema(my_dataframe)
You can also create unmanaged SQL datasets using https://doc.dataiku.com/dss/latest/python-api/datasets-other.html#sql-dataset-programmatic-creation
Let me know if that helps!
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,258 Dataiker
Hi Florent,
Writing a df into a specific database can best be done via writing your df output to a SQL dataset. The steps would include:
1) Adding a database connection to your SQL database. See supported SQL database here: : https://doc.dataiku.com/dss/latest/connecting/sql.html
2) Specifying an output dataset within that connection and you can write the DF to the dataset using write_with_schema() or write_dataframe() https://doc.dataiku.com/dss/latest/code_recipes/python.html#writing-a-pandas-dataframe-in-a-dataset
If the above is not suitable you can technically use https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html but this would be done outside of DSS so any credentials would need to be included in the recipe, JDBC drivers handled outside of DSS as well.
Hope this helps!
-
Hi Alex,
Thank you very much for your response.
Actually what I want to do is save a table in a database with a name that varies (with current date for example and other information). So my idea was to define gloval variables, create a Python recipe and use these global variables and the current date to define the name of the table and save this table to the database.
I don't know how to do that and I don't know if it the best way to do this.
Thank you,
Florent