Issues with Hive MapTypes

rnorm
rnorm Registered Posts: 9 ✭✭✭✭

Hello there!

First time poster, so don't hesitate to ask for more informations.

I'm currently working with a HDFS/Hive table that contains a field of type "map" with the following pattern:

mapcol
{"key1":0, "key2":1, "key3":3, "key4", 50}

This is with a field that only contains keys in character format and returns values in integer format.

When using DBeaver or other means (Hue, DBviz) to access the values of this map column, we usually do that kind of syntax in HiveQL:

select mapcol['key1'] as variable1

from table

which yields the value of the line for "key1" (in above example, 0).

But in DSS when I want the values for specifics keys in a HiveQL recipe, I get the following error:

Validation failed: in column mapcol: In column mapcol: cannot handle a map without explicit keys and values types to Hive

Thing is, I have no idea what this error refers to. I've checked the Hive doc and I don't know what I'm missing.

For further informations, I'm using DSS 5.2, and I've manually set my HDFS table schema for this specific column to "map" because when DSS detects automatically the format it assumes its a "string" type format. Which causes some false positives when validating my code later on.

Now, when I press "run anyway", the whole thing works flawlessly. But the fact that I cannot validate my data means I have to manually create the output table schema of the recipe, which is not great when I want to grab lots of values for keys of a map column.

Is there any prior to that type of issue with map types in DSS ? Any workaround that could come to mind ?

Thanks

Best Answer

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
    Answer ✓

    Hi @rnorm
    , welcome to the community!

    I was trying to reproduce the validation error that you are seeing. Apparently you have to be really careful with the "storage type" of your `mapcol`, in the HDFS dataset.

    I created a table just like the one you shared, and at first the "storage type" was of type "string". So I changed it to "map" with the "Edit column schema" as is shown in the next screenshot:

    Selection_354.png

    After that I was able to run the query you provided without errors! (When I run it first, before checking the column storage type, I did got the same validation error as you)

    Hope this can help a bit

    Cheers!

Answers

  • rnorm
    rnorm Registered Posts: 9 ✭✭✭✭

    Thanks for the reply (I saw the notifications just now!). That's what I figured as well, it seems DSS is more comfortable with some storage formats (strings) than others (varchar and other exotic things). I'll keep that in mind going forward so we actually have our IS team look up the storage format so we don't have compatibility issues.

Setup Info
    Tags
      Help me…