Issues with dates in Python recipe

Options
UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
edited July 16 in Using Dataiku

Hi,

I had a recipe in DSS 2.3 that worked properly and doesn't work in DSS 3.0


# Read df from a dataset. "date" is a column of type "date in DSS
# df.date is a date column

# do stuff with df

df.fillna("")

# do stuff with df

dataset.write_with_schema(df)

In DSS 3.0, the output column is now a string, not a date anymore

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Options

    Hi,

    In DSS 3.0, DSS was upgraded to Pandas 0.17, which indeed introduces a behavior change regarding fillna on date columns.

    * In DSS 2.3 / Pandas 0.16, filling a date column with "" filled the column with the "NaT" value ("Not a time") and kept the dtype - filling with "anyotherstring" failed

    * In DSS 3.0 / Pandas 0.17, filling a date column with any string, whereas empty or not-empty now triggers a downcast of the column to object, which DSS then interprets as a string column

    Pandas 0.16:

    Pandas 0.17:

    Filling a whole dataframe, containing mixed value types, with a single value is inherently dangerous. Both behaviors of Pandas are questionable, but in fine, you'd probably want to fillna only the columns for which it makes sense, with a properly-typed value

Setup Info
    Tags
      Help me…