How to prevent DSS replace NA with null?

kevinhoang84 · June 2024

Hi,

I'm using Python recipe to query and insert data to the output SQL Server dataset as below.

import dataiku
import pandas as pd
from dataiku import SQLExecutor2

# Read recipe inputs
p787PDMItem = dataiku.Dataset("_P787PDMItem_src")
p787PDMItem_df = p787PDMItem.get_dataframe()

# Initialize an empty DataFrame to collect all results
all_data = pd.DataFrame()

# Loop through each item and generate SQL queries
connection_name = "MyConnection"
executor = SQLExecutor2(connection=connection_name)

for item in p787PDMItem_df["ITEM"]:
    #print(f"ITEM: {item}")
    query = f"""
        SELECT Status, Value FROM MyTable
        WHERE PRODUCT_ID = '{item}'
    """

    # Execute the SQL query and fetch the result    
    query_result_df = executor.query_to_df(query)

    # Append the result to the all_data DataFrame
    all_data = all_data.append(query_result_df, ignore_index=True)

# Write the collected data to the _P787PDM dataset
p787PDM = dataiku.Dataset("_P787PDM")
p787PDM.write_dataframe(all_data)

Here are the values of Status column: NA, WS001, WS001. However, the 'NA' value in the output dataset _P787PDM is replaced with 'null'.

The values in the 'Value' column are '004', '002', '001', etc. However, these values are also replaced with 4, 2, 1 in the output dataset.

Do you know how to prevent this replacement in Dataiku?

Marlan · June 2024

Hi @kevinhoang84
,

I don't think you can control how NA values are handled in the query_to_df method. I believe this is on Dataiku's list to add but you could it as a suggestion in the Product Ideas area of this site.

Try using infer_from_schema=True in the query_to_df method to see if that helps with the conversion of '004' to 4 etc. My experience is that it will solve this but your situation may be different.

Marlan

kevinhoang84 · June 2024

The values in the 'Value' column are '004', '002', '001', etc. However, these values are also replaced with 4, 2, 1 in the output dataset. How to prevent this replacement?

kevinhoang84 · June 2024

Thank you very much, Marlan. infer_from_schema=True helps to remain the leading zeros.

I have to call the 'fillna' method to revert null to 'NA' as below.

all_data['Status'].fillna('NA', inplace=True)

How to prevent DSS replace NA with null?

Tags

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories

Sign up to take part

How to prevent DSS replace NA with null?

Tags

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories