How can I write a Pandas dataframe in a Database in SQL Server connection?
Suppose I have a Pandas DF and I want to create a new table in a SQL Server connection with all the data in the DF. For Snowflake, I use the DkuSnowpark module and write_with_schema, but I couldn't find something similar for SQL Server.
I tried using SQL Alchemy but I got driver error, but couldn't find another way.
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,167 Neuron
Creating a new Python recipe will give you a code snippet that will write a pandas dataframe to a table. Just create the output dataset in your desired connection as part of creating the Python recipe and Dataiku will do the rest.
-
But that creates a dataset in the DSS. I want to be able to create a table directly in SQL Server without creating a managed folder
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,167 Neuron
Why? What exactly are trying to achieve?
-
I am running a scenario which reads a dataset and generates tables. I want to be able to save those tables in SQL Server.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,167 Neuron
Every Dataiku dataset you create will have an underlying table created. It’s not clear to me why you don’t want to create the Dataiku dataset. You are merely explaining the step but not really the whole goal.
-
executor = SQLExecutor2(connection="my_db")
foo_query = "select * from foo"
foo_df = executor.query_to_df(foo_query)
foo_ds= dataiku.Dataset(foo_table)
foo_ds.write_with_schema(foo_df)
You can do this in a scenario as you mentioned, I am not sure if you can circumvent using a managed dataset. I usually just island off tables that are not as relevant to the flow in a Zone.