Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on June 5, 2024 9:37PM
Likes: 0
Replies: 3
Hi,
I'm using Python recipe to query and insert data to the output SQL Server dataset as below.
import dataiku import pandas as pd from dataiku import SQLExecutor2 # Read recipe inputs p787PDMItem = dataiku.Dataset("_P787PDMItem_src") p787PDMItem_df = p787PDMItem.get_dataframe() # Initialize an empty DataFrame to collect all results all_data = pd.DataFrame() # Loop through each item and generate SQL queries connection_name = "MyConnection" executor = SQLExecutor2(connection=connection_name) for item in p787PDMItem_df["ITEM"]: #print(f"ITEM: {item}") query = f""" SELECT Status, Value FROM MyTable WHERE PRODUCT_ID = '{item}' """ # Execute the SQL query and fetch the result query_result_df = executor.query_to_df(query) # Append the result to the all_data DataFrame all_data = all_data.append(query_result_df, ignore_index=True) # Write the collected data to the _P787PDM dataset p787PDM = dataiku.Dataset("_P787PDM") p787PDM.write_dataframe(all_data)
Here are the values of Status column: NA, WS001, WS001. However, the 'NA' value in the output dataset _P787PDM is replaced with 'null'.
The values in the 'Value' column are '004', '002', '001', etc. However, these values are also replaced with 4, 2, 1 in the output dataset.
Do you know how to prevent this replacement in Dataiku?
Hi @kevinhoang84
,
I don't think you can control how NA values are handled in the query_to_df method. I believe this is on Dataiku's list to add but you could it as a suggestion in the Product Ideas area of this site.
Try using infer_from_schema=True in the query_to_df method to see if that helps with the conversion of '004' to 4 etc. My experience is that it will solve this but your situation may be different.
Marlan
The values in the 'Value' column are '004', '002', '001', etc. However, these values are also replaced with 4, 2, 1 in the output dataset. How to prevent this replacement?
Thank you very much, Marlan. infer_from_schema=True helps to remain the leading zeros.
I have to call the 'fillna' method to revert null to 'NA' as below.
all_data['Status'].fillna('NA', inplace=True)