Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
How do I write null value into MongoDB? It seems like DSS is only writing it as string. I am using a python script and sending python โNoneโ into panda data frame. Mongo DB will not register the key if itโs an empty string, it needs it to be a null data type
Hi @Desmond ,
Indeed, it's not possible to store null values in a MongoDB document using a DataFrame. The Python None values are considered as missing attributes accordingly to this NoSQL specific allowance.
However, if your column is numerical, you can force writing a null value by setting it to NaN. To do so you need to use a DatasetWriter.
Here a code sample to do so:
import dataiku
import numpy as np
dataset = dataiku.Dataset("dataset_name")
dataset.write_schema([{'name' : 'name', 'type' : 'string'},{'name' : 'age', 'type' : 'bigint'}], dropAndCreate=True)
writer = tmp_null_value.get_writer()
writer.write_row_array(['Amy', 52])
writer.write_row_array(['Hannah', np.nan])
writer.close()
Otherwise, if you want to store a null value for a string column, you'll have to write them manually without going through the Dataiku API.
The following code sample can help you, it uses the PyMongo Python package to write in a MongoDB database without authentication:
import dataiku
import pymongo
client = dataiku.api_client()
dataset = dataiku.Dataset("dataset_name")
conn = client.get_connection(dataset.get_config()['params']['connection'])
myclient = pymongo.MongoClient(host = conn.get_info()['params']['host'], port = conn.get_info()['params']['port'])
mydb = myclient[conn.get_info()['params']['db']]
mycol = mydb[dataset.get_config()['params']['collection']]
dataset.write_schema([{'name' : 'name', 'type' : 'string'},{'name' : 'age', 'type' : 'bigint'}], dropAndCreate=True)
mylist = [
{ "name": "Amy", "age": 52},
{ "name": "Hannah", "age": None},
{ "name": None, "age": 45},
{ "name": "Richard", "age": 31}
]
x = mycol.insert_many(mylist)
print('Inserted {} rows.'.format(len(x.inserted_ids)))
Have a good day!