Community Conundrum 28: News Engagement is live! Read More

Issues with dates in Python recipe

Dataiker
Dataiker
Issues with dates in Python recipe

Hi,



I had a recipe in DSS 2.3 that worked properly and doesn't work in DSS 3.0




# Read df from a dataset. "date" is a column of type "date in DSS
# df.date is a date column

# do stuff with df

df.fillna("")

# do stuff with df

dataset.write_with_schema(df)


In DSS 3.0, the output column is now a string, not a date anymore

0 Kudos
1 Reply
Dataiker
Dataiker

Hi,



In DSS 3.0, DSS was upgraded to Pandas 0.17, which indeed introduces a behavior change regarding fillna on date columns.



* In DSS 2.3 / Pandas 0.16, filling a date column with "" filled the column with the "NaT" value ("Not a time") and kept the dtype - filling with "anyotherstring" failed



* In DSS 3.0 / Pandas 0.17, filling a date column with any string, whereas empty or not-empty now triggers a downcast of the column to object, which DSS then interprets as a string column



Pandas 0.16:





Pandas 0.17:





Filling a whole dataframe, containing mixed value types, with a single value is inherently dangerous. Both behaviors of Pandas are questionable, but in fine, you'd probably want to fillna only the columns for which it makes sense, with a properly-typed value

0 Kudos
Labels (4)
A banner prompting to get Dataiku DSS