NULL instead no value
Hi,
I'm trying to write dataframe into Snowflake table with function write_with_schema. Data comes from excel file, where we have some columns with no value and we want to keep it that way. I used parameters with read_excel and also fillna('') and dataset looks ok (no nulls, only no values/empty), but after using wrIte_with_schema we are getting nulls in those columns. Is there a way to insert data with no value/string lenght 0 instead of nulls?
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Can you check the data type of the column in the data frame and the column in the Snowflake table? Are they the same type?
-
Object type for dataframe and varchar in Snowflake
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
What version of pandas are you using?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Are you sure you are applying fillna() correctly?
https://stackoverflow.com/a/67356585/10491951
Can you you do a count and group by that column to see all possible values after the fillna()?
-
Yes, dataframe looks ok. I just tried different option to have no value instead on null in dataframe, but it seems, that write_with_schema still converts that to null.
Even if there is '' as a value, it goes as null into Snowflake.
-
Raw is my expected output
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Your Python code doesn't look right to me. This is how you usually write to a Dataset from a Python recipe:
# Write recipe outputs customers_out = dataiku.Dataset("some_poutput_dataset") customers_out.write_with_schema(df)
I don't see where you got OUTPUT from in your code.
-
I have OUTPUT declared in a different cell.
OUTPUT = dataiku.Dataset("OUTPUT_DATAIKU")
Everything works, except no values(''), that somehow are converted to null when using write_with_schema/write_from_dataframe
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Can I suggest you restart your kernel and run from top to bottom again? Or run the code in a recipe to make sure there are now left overs from you playing in the Notebook.
-
Done that, I even checked e.g. fillna('5') and it goes to Snowflake as 5. I don't think it's an issue with python/pandas/dataframe, but specifically with writing to Snowflake and using built-in (write_with_schema/writre_from_dataframe) function to do that. Maybe there is a parameter or different function to use?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Try write_with_schema(df, drop_and_create=True) and with write_dataframe(df, infer_schema=False)
-
That's not a solution, everything goes to Snowflake as it should, except '' or "" or ' '- still as null