Meet DSS user Ben Powis, Data Science Manager at UK retail company MandM Direct Read More

Issues with dates in Python recipe

Dataiker
Dataiker
Issues with dates in Python recipe

Hi,



I had a recipe in DSS 2.3 that worked properly and doesn't work in DSS 3.0




# Read df from a dataset. "date" is a column of type "date in DSS
# df.date is a date column

# do stuff with df

df.fillna("")

# do stuff with df

dataset.write_with_schema(df)


In DSS 3.0, the output column is now a string, not a date anymore

0 Kudos
1 Reply
Dataiker
Dataiker

Hi,



In DSS 3.0, DSS was upgraded to Pandas 0.17, which indeed introduces a behavior change regarding fillna on date columns.



* In DSS 2.3 / Pandas 0.16, filling a date column with "" filled the column with the "NaT" value ("Not a time") and kept the dtype - filling with "anyotherstring" failed



* In DSS 3.0 / Pandas 0.17, filling a date column with any string, whereas empty or not-empty now triggers a downcast of the column to object, which DSS then interprets as a string column



Pandas 0.16:





Pandas 0.17:





Filling a whole dataframe, containing mixed value types, with a single value is inherently dangerous. Both behaviors of Pandas are questionable, but in fine, you'd probably want to fillna only the columns for which it makes sense, with a properly-typed value

0 Kudos
Labels (4)