Spark schema: Cannot handle an ARRAY without specified content type

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
Hi,

I obtain this error message when training a model in DSS with Spark MLLib.

However, when I go to the "script" tab, I have properly set the meaning to "Text". Why does DSS still think it's an array ?

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    When DSS trains the model, before applying the preparation script, it needs to load the original dataset as a Spark dataframe. It therefore needs to transform the schema of the dataset to a Spark schema, which requires content types for arrays.

    Only then is the preparation script applied, and the meanings taken into account.

    --> In the specific case of Spark MLLib, you need to make sure that the storage type in the dataset is set to string, in addition to setting the meaning.
Setup Info
    Tags
      Help me…