Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

The partitioning column does not display in dataiku

Solved!
boumezrag
Level 2
The partitioning column does not display in dataiku
Hi everybody,

I an getting a real problem, when I import a table from hive , the partitioning column does not display in dataiku

please any help ?
0 Kudos
1 Solution
fchataigner2
Dataiker
Dataiker
Hi,

partitioning columns in Hive are logical columns that expose the path from the table root directory on HDFS to the files containing the data for a given partition value. In DSS, when you retrieve data from the dataset corresponding to the Hive table, you pass a list of values for the partitioning columns, and the data is filtered on these values. You can see the list of the existing values in the partitioning columns in the Status tab, or in the Sampling panel on the left of your Explore tab.

Regards,

View solution in original post

9 Replies
fchataigner2
Dataiker
Dataiker
Hi,

partitioning columns in Hive are logical columns that expose the path from the table root directory on HDFS to the files containing the data for a given partition value. In DSS, when you retrieve data from the dataset corresponding to the Hive table, you pass a list of values for the partitioning columns, and the data is filtered on these values. You can see the list of the existing values in the partitioning columns in the Status tab, or in the Sampling panel on the left of your Explore tab.

Regards,

View solution in original post

boumezrag
Level 2
Author
Thank you for your answer,
To be honest I didn't understand , in the status tab we can see the list of all columns but not the partitioning one.
My question is : I have a table with 13 columns ( including the partitioning column) , I can see only 12 ! how can I do to display the 13 columns .

Thanks in advance.
0 Kudos
fchataigner2
Dataiker
Dataiker
since you imported a partitioned Hive table as a DSS dataset, you should have a defined partitioning scheme in the dataset's Partitioning tab (under its Settings), with the missing column as dimension.
In the Status tab, you can display as Partition table, and the display will be a table with the partition identifiers as row identifiers. A partition identifier is a '|' separated list of the values of the partitioning columns.
0 Kudos
boumezrag
Level 2
Author
I attached a screenshot,
When you said "and the display will be a table with the partition identifiers as row identifiers"
is this what I screenshoted ?
0 Kudos
boumezrag
Level 2
Author

this is my screenshot



 



 

0 Kudos
fchataigner2
Dataiker
Dataiker
the values for the partition column can indeed be seen on the left.
0 Kudos
boumezrag
Level 2
Author
So there is no way to display this column with the others ??? sorry for asking too many questions
fchataigner2
Dataiker
Dataiker
This is not possible at the moment. But:
- you can specify which values of this partition column you want when you browse a dataset or build a dataset
- you can always access the column and its data via Hiveserver2 (ie SQL notebook, or in a Hive recipe when you set the engine in the Advanced tab to Hiveserver2)
0 Kudos
boumezrag
Level 2
Author
Thank you so much ,
I get it now 😉
0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku DSS