The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Writing null value to MongoDB

Level 1
Writing null value to MongoDB

How do I write null value into MongoDB? It seems like DSS is only writing it as string. I am using a python script and sending python ‘None’ into panda data frame. Mongo DB will not register the key if it’s an empty string, it needs it to be a null data type

0 Kudos
1 Reply

Hi @Desmond ,

Indeed, it's not possible to store null values in a MongoDB document using a DataFrame. The Python None values are considered as missing attributes accordingly to this NoSQL specific allowance.

However, if your column is numerical, you can force writing a null value by setting it to NaN. To do so you need to use a DatasetWriter.

Here a code sample to do so:

import dataiku
import numpy as np
dataset = dataiku.Dataset("dataset_name")
dataset.write_schema([{'name' : 'name', 'type' : 'string'},{'name' : 'age', 'type' : 'bigint'}], dropAndCreate=True)
writer = tmp_null_value.get_writer()
writer.write_row_array(['Amy', 52])
writer.write_row_array(['Hannah', np.nan])



Otherwise, if you want to store a null value for a string column, you'll have to write them manually without going through the Dataiku API.

The following code sample can help you, it uses the PyMongo Python package to write in a MongoDB database without authentication:

import dataiku
import pymongo

client = dataiku.api_client()
dataset = dataiku.Dataset("dataset_name")
conn = client.get_connection(dataset.get_config()['params']['connection'])

myclient = pymongo.MongoClient(host = conn.get_info()['params']['host'], port = conn.get_info()['params']['port'])
mydb = myclient[conn.get_info()['params']['db']]
mycol = mydb[dataset.get_config()['params']['collection']]

dataset.write_schema([{'name' : 'name', 'type' : 'string'},{'name' : 'age', 'type' : 'bigint'}], dropAndCreate=True)

mylist = [
{ "name": "Amy", "age": 52},
{ "name": "Hannah", "age": None},
{ "name": None, "age": 45},
{ "name": "Richard", "age": 31}

x = mycol.insert_many(mylist)
print('Inserted {} rows.'.format(len(x.inserted_ids)))


 Have a good day!

Labels (1)
A banner prompting to get Dataiku DSS