Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Issues with Hive MapTypes

Level 1
Issues with Hive MapTypes

Hello there!

First time poster, so don't hesitate to ask for more informations.

I'm currently working with a HDFS/Hive table that contains a field of type "map" with the following pattern:

mapcol
{"key1":0, "key2":1, "key3":3, "key4", 50}

 

This is with a field that only contains keys in character format and returns values in integer format.

When using DBeaver or other means (Hue, DBviz) to access the values of this map column, we usually do that kind of syntax in HiveQL:

select mapcol['key1'] as variable1

from table

which yields the value of the line for "key1" (in above example, 0).

But in DSS when I want the values for specifics keys in a HiveQL recipe, I get the following error:

  Validation failed: in column mapcol: In column mapcol: cannot handle a map without explicit keys and values types to Hive

Thing is, I have no idea what this error refers to. I've checked the Hive doc and I don't know what I'm missing.

For further informations, I'm using DSS 5.2, and I've manually set my HDFS table schema for this specific column to "map" because when DSS detects automatically the format it assumes its a "string" type format. Which causes some false positives when validating my code later on.

Now, when I press "run anyway", the whole thing works flawlessly. But the fact that I cannot validate my data means I have to manually create the output table schema of the recipe, which is not great when I want to grab lots of values for keys of a map column.

Is there any prior to that type of issue with map types in DSS ? Any workaround that could come to mind ?

Thanks

0 Kudos
1 Reply
Level 5

Hi @rnorm, welcome to the community!

I was trying to reproduce the validation error that you are seeing. Apparently you have to be really careful with the "storage type" of your `mapcol`, in the HDFS dataset.

I created a table just like the one you shared, and at first the "storage type" was of type "string". So I changed it to "map" with the "Edit column schema" as is shown in the next screenshot:

Selection_354.png

After that I was able to run the query you provided without errors! (When I run it first, before checking the column storage type, I did got the same validation error as you)

Hope this can help a bit

Cheers!

0 Kudos