Dates localization coerced to UTC
Hello,
I'm trying to localize a column with dates from UTC to CET/CEST using a python recipe. The results are correct when I open it in the notebook but seems that Dataiku coerces the dates back to UTC when writing the dataframe.
Here below, a code that can reproduce then issue
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu from pandas.tseries.offsets import DateOffset np.random.seed(0) random_dates = pd.to_datetime(np.random.randint( pd.Timestamp('2000-01-01').value, pd.Timestamp('2022-12-31').value, size=10), unit='ns') df = pd.DataFrame({'DateTime_raw': random_dates}) #localize as UTC df['DateTime_UTC'] = df['DateTime_raw'].dt.tz_localize('UTC') #localize as CET/CEST df['DateTime_CET'] = df['DateTime_UTC'].dt.tz_convert('CET') #convert to string df['DateTime_UTC_string'] = df['DateTime_UTC'].astype(str) df['DateTime_CET_string'] = df['DateTime_CET'].astype(str)
this is one of the code outputs:
2012-03-18T23:26:35.403Z ['DateTime_UTC']
2012-03-18T23:26:35.403Z ['DateTime_CET]
2012-03-18 23:26:35.403958002+00:00 ['DateTime_UTC_string]
2012-03-19 00:26:35.403958002+01:00 ['DateTime_CET_string]
The CET localization is kept only if the date is stored as string. Apparently, this is a default behavior in Dataiku. I'm wondering if there are other solutions rather than necessarily store the column as string. Is it possible to disable it?
Thanks in advance,
Matteo
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,248 Dataiker
Hi,
Indeed, this is the expected behavior currently. As you've already found, the way to preserve the exact date format with time zone is for these columns to string.
https://doc.dataiku.com/dss/latest/preparation/dates.html#timezones-handling