Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

How to prevent DSS replace NA with null?

Solved!
kevinhoang84
Level 2
How to prevent DSS replace NA with null?

Hi,

I'm using Python recipe to query and insert data to the output SQL Server dataset as below.

 

import dataiku
import pandas as pd
from dataiku import SQLExecutor2

# Read recipe inputs
p787PDMItem = dataiku.Dataset("_P787PDMItem_src")
p787PDMItem_df = p787PDMItem.get_dataframe()

# Initialize an empty DataFrame to collect all results
all_data = pd.DataFrame()

# Loop through each item and generate SQL queries
connection_name = "MyConnection"
executor = SQLExecutor2(connection=connection_name)

for item in p787PDMItem_df["ITEM"]:
    #print(f"ITEM: {item}")
    query = f"""
        SELECT Status, Value FROM MyTable
        WHERE PRODUCT_ID = '{item}'
    """

    # Execute the SQL query and fetch the result    
    query_result_df = executor.query_to_df(query)

    # Append the result to the all_data DataFrame
    all_data = all_data.append(query_result_df, ignore_index=True)

# Write the collected data to the _P787PDM dataset
p787PDM = dataiku.Dataset("_P787PDM")
p787PDM.write_dataframe(all_data)

 

Here are the values of Status column: NA, WS001, WS001. However, the 'NA' value in the output dataset _P787PDM is replaced with 'null'.

The values in the 'Value' column are '004', '002', '001', etc. However, these values are also replaced with 4, 2, 1 in the output dataset.

Do you know how to prevent this replacement in Dataiku?

 

0 Kudos
1 Solution
Marlan

Hi @kevinhoang84,

I don't think you can control how NA values are handled in the query_to_df method. I believe this is on Dataiku's list to add but you could it as a suggestion in the Product Ideas area of this site.

Try using infer_from_schema=True in the query_to_df method to see if that helps with the conversion of '004' to 4 etc. My experience is that it will solve this but your situation may be different.

Marlan

View solution in original post

0 Kudos
3 Replies
kevinhoang84
Level 2
Author

The values in the 'Value' column are '004', '002', '001', etc. However, these values are also replaced with 4, 2, 1 in the output dataset. How to prevent this replacement?

0 Kudos
Marlan

Hi @kevinhoang84,

I don't think you can control how NA values are handled in the query_to_df method. I believe this is on Dataiku's list to add but you could it as a suggestion in the Product Ideas area of this site.

Try using infer_from_schema=True in the query_to_df method to see if that helps with the conversion of '004' to 4 etc. My experience is that it will solve this but your situation may be different.

Marlan

0 Kudos
kevinhoang84
Level 2
Author

Thank you very much, Marlan. infer_from_schema=True helps to remain the leading zeros.

I have to call the 'fillna' method to revert null to 'NA' as below.

all_data['Status'].fillna('NA', inplace=True)