Hive Partitioned Table
ubethke
Registered Posts: 21 ✭✭✭✭
I have imported a partitioned Hive table through the command line utility. The partitioning pattern is recognised and I can list the partitions. However the column that the table is partitioned on does not show up in the schema of the dataset. Any tips?
thanks
uli
thanks
uli
Tagged:
Answers
-
Hi Uli, This is actually normal: there are two partitioning models for DSS: "files-based" and "column-based". In files based partitioning, the general rule is that the partitioning dimensions don't appear in the data files.
That's the case here: the data files don't actually contain the partitioning dimension. The DSS schema represents the physical schema of the data, so the partitioning dimensions don't appear there either.
Hive has a fairly hybrid behavior: the partitioning columns are not fully considered as part of the schema, but a virtual column is automatically created.
This behavior is not without problems: for example, you can't do "create table as select *" since that makes the partitioning dimension "appear".
DSS does not do that: the partitioning dimensions don't appear when you explore a dataset. At the moment, unfortunately, it's not possible to know which partition a record belongs to. We're thinking about ways to improve this.