Hive Partitioned Table

Registered Posts: 21 ✭✭✭✭
I have imported a partitioned Hive table through the command line utility. The partitioning pattern is recognised and I can list the partitions. However the column that the table is partitioned on does not show up in the schema of the dataset. Any tips?

thanks

uli

Answers

  • Dataiker, Dataiku DSS Core Designer, Registered, Moderator Posts: 753 Dataiker
    Hi Uli, This is actually normal: there are two partitioning models for DSS: "files-based" and "column-based". In files based partitioning, the general rule is that the partitioning dimensions don't appear in the data files.

    That's the case here: the data files don't actually contain the partitioning dimension. The DSS schema represents the physical schema of the data, so the partitioning dimensions don't appear there either.

    Hive has a fairly hybrid behavior: the partitioning columns are not fully considered as part of the schema, but a virtual column is automatically created.
    This behavior is not without problems: for example, you can't do "create table as select *" since that makes the partitioning dimension "appear".

    DSS does not do that: the partitioning dimensions don't appear when you explore a dataset. At the moment, unfortunately, it's not possible to know which partition a record belongs to. We're thinking about ways to improve this.

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.