New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

Spark schema: Cannot handle an ARRAY without specified content type

UserBird
Dataiker
Dataiker
Spark schema: Cannot handle an ARRAY without specified content type
Hi,

I obtain this error message when training a model in DSS with Spark MLLib.

However, when I go to the "script" tab, I have properly set the meaning to "Text". Why does DSS still think it's an array ?
0 Kudos
1 Reply
Clément_Stenac
Dataiker
Dataiker
When DSS trains the model, before applying the preparation script, it needs to load the original dataset as a Spark dataframe. It therefore needs to transform the schema of the dataset to a Spark schema, which requires content types for arrays.

Only then is the preparation script applied, and the meanings taken into account.

--> In the specific case of Spark MLLib, you need to make sure that the storage type in the dataset is set to string, in addition to setting the meaning.
0 Kudos
Labels (4)
A banner prompting to get Dataiku DSS