How to prevent DSS replace NA with null?

kevinhoang84
kevinhoang84 Registered Posts: 7
edited July 16 in Using Dataiku

Hi,

I'm using Python recipe to query and insert data to the output SQL Server dataset as below.

import dataiku
import pandas as pd
from dataiku import SQLExecutor2

# Read recipe inputs
p787PDMItem = dataiku.Dataset("_P787PDMItem_src")
p787PDMItem_df = p787PDMItem.get_dataframe()

# Initialize an empty DataFrame to collect all results
all_data = pd.DataFrame()

# Loop through each item and generate SQL queries
connection_name = "MyConnection"
executor = SQLExecutor2(connection=connection_name)

for item in p787PDMItem_df["ITEM"]:
    #print(f"ITEM: {item}")
    query = f"""
        SELECT Status, Value FROM MyTable
        WHERE PRODUCT_ID = '{item}'
    """

    # Execute the SQL query and fetch the result    
    query_result_df = executor.query_to_df(query)

    # Append the result to the all_data DataFrame
    all_data = all_data.append(query_result_df, ignore_index=True)

# Write the collected data to the _P787PDM dataset
p787PDM = dataiku.Dataset("_P787PDM")
p787PDM.write_dataframe(all_data)

Here are the values of Status column: NA, WS001, WS001. However, the 'NA' value in the output dataset _P787PDM is replaced with 'null'.

The values in the 'Value' column are '004', '002', '001', etc. However, these values are also replaced with 4, 2, 1 in the output dataset.

Do you know how to prevent this replacement in Dataiku?

Tagged:

Best Answer

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron
    Answer ✓

    Hi @kevinhoang84
    ,

    I don't think you can control how NA values are handled in the query_to_df method. I believe this is on Dataiku's list to add but you could it as a suggestion in the Product Ideas area of this site.

    Try using infer_from_schema=True in the query_to_df method to see if that helps with the conversion of '004' to 4 etc. My experience is that it will solve this but your situation may be different.

    Marlan

Answers

  • kevinhoang84
    kevinhoang84 Registered Posts: 7

    The values in the 'Value' column are '004', '002', '001', etc. However, these values are also replaced with 4, 2, 1 in the output dataset. How to prevent this replacement?

  • kevinhoang84
    kevinhoang84 Registered Posts: 7

    Thank you very much, Marlan. infer_from_schema=True helps to remain the leading zeros.

    I have to call the 'fillna' method to revert null to 'NA' as below.

    all_data['Status'].fillna('NA', inplace=True)

Setup Info
    Tags
      Help me…