Matteo Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 3 ✭✭✭
edited July 16 in General Discussion

I'm trying to localize a column with dates from UTC to CET/CEST using a python recipe. The results are correct when I open it in the notebook but seems that Dataiku coerces the dates back to UTC when writing the dataframe.

Here below, a code that can reproduce then issue

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from pandas.tseries.offsets import DateOffset

random_dates = pd.to_datetime(np.random.randint(
    size=10), unit='ns')

df = pd.DataFrame({'DateTime_raw': random_dates})

#localize as UTC
df['DateTime_UTC'] = df['DateTime_raw'].dt.tz_localize('UTC')

#localize as CET/CEST
df['DateTime_CET'] = df['DateTime_UTC'].dt.tz_convert('CET')

#convert to string
df['DateTime_UTC_string'] = df['DateTime_UTC'].astype(str)
df['DateTime_CET_string'] = df['DateTime_CET'].astype(str)

this is one of the code outputs:

2012-03-18T23:26:35.403Z ['DateTime_UTC']
2012-03-18T23:26:35.403Z ['DateTime_CET]
2012-03-18 23:26:35.403958002+00:00 ['DateTime_UTC_string]
2012-03-19 00:26:35.403958002+01:00 ['DateTime_CET_string]

The CET localization is kept only if the date is stored as string. Apparently, this is a default behavior in Dataiku. I'm wondering if there are other solutions rather than necessarily store the column as string. Is it possible to disable it?

Thanks in advance,



