Issues with dates in Python recipe
Hi,
I had a recipe in DSS 2.3 that worked properly and doesn't work in DSS 3.0
# Read df from a dataset. "date" is a column of type "date in DSS
# df.date is a date column
# do stuff with df
df.fillna("")
# do stuff with df
dataset.write_with_schema(df)
In DSS 3.0, the output column is now a string, not a date anymore
Answers
-
Hi,
In DSS 3.0, DSS was upgraded to Pandas 0.17, which indeed introduces a behavior change regarding fillna on date columns.
* In DSS 2.3 / Pandas 0.16, filling a date column with "" filled the column with the "NaT" value ("Not a time") and kept the dtype - filling with "anyotherstring" failed
* In DSS 3.0 / Pandas 0.17, filling a date column with any string, whereas empty or not-empty now triggers a downcast of the column to object, which DSS then interprets as a string column
Pandas 0.16:
Pandas 0.17:
Filling a whole dataframe, containing mixed value types, with a single value is inherently dangerous. Both behaviors of Pandas are questionable, but in fine, you'd probably want to fillna only the columns for which it makes sense, with a properly-typed value