Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on June 11, 2020 11:43AM
Likes: 0
Replies: 1
How do I write null value into MongoDB? It seems like DSS is only writing it as string. I am using a python script and sending python ‘None’ into panda data frame. Mongo DB will not register the key if it’s an empty string, it needs it to be a null data type
Hi @Desmond
,
Indeed, it's not possible to store null
values in a MongoDB document using a DataFrame. The Python None
values are considered as missing attributes accordingly to this NoSQL specific allowance.
However, if your column is numerical, you can force writing a null
value by setting it to NaN
. To do so you need to use a DatasetWriter.
Here a code sample to do so:
import dataiku
import numpy as np
dataset = dataiku.Dataset("dataset_name")
dataset.write_schema([{'name' : 'name', 'type' : 'string'},{'name' : 'age', 'type' : 'bigint'}], dropAndCreate=True)
writer = tmp_null_value.get_writer()
writer.write_row_array(['Amy', 52])
writer.write_row_array(['Hannah', np.nan])
writer.close()
Otherwise, if you want to store a null
value for a string column, you'll have to write them manually without going through the Dataiku API.
The following code sample can help you, it uses the PyMongo Python package to write in a MongoDB database without authentication:
import dataiku
import pymongo
client = dataiku.api_client()
dataset = dataiku.Dataset("dataset_name")
conn = client.get_connection(dataset.get_config()['params']['connection'])
myclient = pymongo.MongoClient(host = conn.get_info()['params']['host'], port = conn.get_info()['params']['port'])
mydb = myclient[conn.get_info()['params']['db']]
mycol = mydb[dataset.get_config()['params']['collection']]
dataset.write_schema([{'name' : 'name', 'type' : 'string'},{'name' : 'age', 'type' : 'bigint'}], dropAndCreate=True)
mylist = [
{ "name": "Amy", "age": 52},
{ "name": "Hannah", "age": None},
{ "name": None, "age": 45},
{ "name": "Richard", "age": 31}
]
x = mycol.insert_many(mylist)
print('Inserted {} rows.'.format(len(x.inserted_ids)))
Have a good day!