Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Hive Partitioned Table

Level 2
Hive Partitioned Table
I have imported a partitioned Hive table through the command line utility. The partitioning pattern is recognised and I can list the partitions. However the column that the table is partitioned on does not show up in the schema of the dataset. Any tips?


1 Reply
Hi Uli, This is actually normal: there are two partitioning models for DSS: "files-based" and "column-based". In files based partitioning, the general rule is that the partitioning dimensions don't appear in the data files.

That's the case here: the data files don't actually contain the partitioning dimension. The DSS schema represents the physical schema of the data, so the partitioning dimensions don't appear there either.

Hive has a fairly hybrid behavior: the partitioning columns are not fully considered as part of the schema, but a virtual column is automatically created.
This behavior is not without problems: for example, you can't do "create table as select *" since that makes the partitioning dimension "appear".

DSS does not do that: the partitioning dimensions don't appear when you explore a dataset. At the moment, unfortunately, it's not possible to know which partition a record belongs to. We're thinking about ways to improve this.
0 Kudos