NULL instead no value

marbys
Level 1
NULL instead no value

Hi,

I'm trying to write dataframe into Snowflake table with function write_with_schema. Data comes from excel file, where we have some columns with no value and we want to keep it that way.  I used parameters with read_excel and also fillna('') and dataset looks ok (no nulls, only no values/empty), but after using wrIte_with_schema we are getting nulls  in those columns. Is there a way to insert data with no value/string lenght 0 instead of nulls?

0 Kudos
12 Replies
Turribeach

Can you check the data type of the column in the data frame and the column in the Snowflake table? Are they the same type?

0 Kudos
marbys
Level 1
Author

Object type for dataframe and varchar in Snowflake

0 Kudos
Turribeach

What version of pandas are you using?

0 Kudos
Turribeach

Are you sure you are applying fillna() correctly?

https://stackoverflow.com/a/67356585/10491951

Can you you do a count and group by that column to see all possible values after the fillna()?

0 Kudos
marbys
Level 1
Author

Yes, dataframe looks ok. I just tried different option to have no value instead on null in dataframe, but it seems, that write_with_schema still converts that to null.

429105254_1567162310702382_3338356576566462941_n.png

429041625_1171443807573497_5522633410630892444_n.png

429118700_910385297448119_1749607961959726017_n.png

Even if there is '' as a value, it goes as null into Snowflake. 

0 Kudos
Turribeach

Your Python code doesn't look right to me. This is how you usually write to a Dataset from a Python recipe:

 

# Write recipe outputs
customers_out = dataiku.Dataset("some_poutput_dataset")
customers_out.write_with_schema(df)

 

 I don't see where you got OUTPUT from in your code.

0 Kudos
marbys
Level 1
Author

I have OUTPUT declared in a different cell.

OUTPUT = dataiku.Dataset("OUTPUT_DATAIKU")

Everything works, except no values(''), that somehow are converted to null when using write_with_schema/write_from_dataframe

0 Kudos
Turribeach

Can I suggest you restart your kernel and run from top to bottom again? Or run the code in a recipe to make sure there are now left overs from you playing in the Notebook.

0 Kudos
marbys
Level 1
Author

Done that, I even checked e.g. fillna('5') and it goes to Snowflake as 5. I don't think it's an issue with python/pandas/dataframe, but specifically with writing to Snowflake and using built-in (write_with_schema/writre_from_dataframe) function to do that. Maybe there is a parameter or different function to use?

0 Kudos
Turribeach

Try write_with_schema(df, drop_and_create=True) and with write_dataframe(df, infer_schema=False)

0 Kudos
marbys
Level 1
Author

That's not a solution,  everything goes to Snowflake as it should, except '' or "" or ' '- still as null

0 Kudos
marbys
Level 1
Author

429302783_1048536849586892_5340554447217456188_n.png

Raw is my expected output

0 Kudos