Hive Partitioned Table

ubethke
ubethke Registered Posts: 21 ✭✭✭✭
I have imported a partitioned Hive table through the command line utility. The partitioning pattern is recognised and I can list the partitions. However the column that the table is partitioned on does not show up in the schema of the dataset. Any tips?

thanks

uli

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi Uli, This is actually normal: there are two partitioning models for DSS: "files-based" and "column-based". In files based partitioning, the general rule is that the partitioning dimensions don't appear in the data files.

    That's the case here: the data files don't actually contain the partitioning dimension. The DSS schema represents the physical schema of the data, so the partitioning dimensions don't appear there either.

    Hive has a fairly hybrid behavior: the partitioning columns are not fully considered as part of the schema, but a virtual column is automatically created.
    This behavior is not without problems: for example, you can't do "create table as select *" since that makes the partitioning dimension "appear".

    DSS does not do that: the partitioning dimensions don't appear when you explore a dataset. At the moment, unfortunately, it's not possible to know which partition a record belongs to. We're thinking about ways to improve this.
Setup Info
    Tags
      Help me…