Writing null value to MongoDB

Desmond Registered Posts: 1 ✭✭✭✭

How do I write null value into MongoDB? It seems like DSS is only writing it as string. I am using a python script and sending python ‘None’ into panda data frame. Mongo DB will not register the key if it’s an empty string, it needs it to be a null data type



  • dimitri
    dimitri Dataiker, Product Ideas Manager Posts: 33 Dataiker
    edited 3:35PM

    Hi @Desmond

    Indeed, it's not possible to store null values in a MongoDB document using a DataFrame. The Python None values are considered as missing attributes accordingly to this NoSQL specific allowance.

    However, if your column is numerical, you can force writing a null value by setting it to NaN. To do so you need to use a DatasetWriter.

    Here a code sample to do so:

    import dataiku
    import numpy as np
    dataset = dataiku.Dataset("dataset_name")
    dataset.write_schema([{'name' : 'name', 'type' : 'string'},{'name' : 'age', 'type' : 'bigint'}], dropAndCreate=True)
    writer = tmp_null_value.get_writer()
    writer.write_row_array(['Amy', 52])
    writer.write_row_array(['Hannah', np.nan])


    Otherwise, if you want to store a null value for a string column, you'll have to write them manually without going through the Dataiku API.

    The following code sample can help you, it uses the PyMongo Python package to write in a MongoDB database without authentication:

    import dataiku
    import pymongo

    client = dataiku.api_client()
    dataset = dataiku.Dataset("dataset_name")
    conn = client.get_connection(dataset.get_config()['params']['connection'])

    myclient = pymongo.MongoClient(host = conn.get_info()['params']['host'], port = conn.get_info()['params']['port'])
    mydb = myclient[conn.get_info()['params']['db']]
    mycol = mydb[dataset.get_config()['params']['collection']]

    dataset.write_schema([{'name' : 'name', 'type' : 'string'},{'name' : 'age', 'type' : 'bigint'}], dropAndCreate=True)

    mylist = [
    { "name": "Amy", "age": 52},
    { "name": "Hannah", "age": None},
    { "name": None, "age": 45},
    { "name": "Richard", "age": 31}

    x = mycol.insert_many(mylist)
    print('Inserted {} rows.'.format(len(x.inserted_ids)))


    Have a good day!

Setup Info
      Help me…