New Academy Certification: ML Practitioner Live NOW! Learn more

Keep a column as ARRAY when synchronizing a PostgreSQL table into a dataset

Level 1
Keep a column as ARRAY when synchronizing a PostgreSQL table into a dataset

I tried to sync a PostgreSQL table in a dataset but columns of type `array` are converted to type `string`.

I need to keep those columns as `array` to be able to use functions such as `unnest` or `string_agg` in a further recipe.

If I change the schema manually in the tab type of the dataset, I get this error when running the job:




Can't handle column type ARRAY in SQL


I am using Dataiku DSS version 3.1.0 with a PostgreSQL 9.4 database



Does anyone have an idea?

0 Kudos
2 Replies
Dataiker
Dataiker
Hi,

This is not directly possible in DSS which doesn't support at all the array type of PostgreSQL. However, if you let DSS store the array as string, it will store it as JSON text. PostgreSQL then provides many functions to handle JSON texts, treat them as arrays, extract elements, ...: https://www.postgresql.org/docs/9.4/static/functions-json.html
0 Kudos
Level 1
Author
Thanks, I'll dig into that solution.
0 Kudos