Community Conundrum 27: Stacks of Questions is live! Read More

Error while executing spark

Level 3
Error while executing spark

We are getting following errors while exectuing Spark

sparlerro.PNG

7 Replies
Dataiker
Dataiker

Hi,

You would need to attach your entire log and/or job diag, not just a part of the error message.

If it is not possible for confidentiality reasons, please submit this as a support ticket, with an attached job diagnosis. Please see https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html for more details.

Level 3
Author

there is a log . Kindly help us ..thanks in advance

 

Dataiker
Dataiker

Hello,

  1. In your test set for this machine learning task, do you happen to have "false" and/or "true" as target for the prediction?
  2. If no, are the schemas or format of your train dataset and test dataset different, especially around the target column?
  3. If no, on each of those datasets, can you click on the target column > Analyse and send us a screenshot for both?

 

Level 3
Author

Thanks for your reply.

yes we are using True /false prediction. Does it not support TRUE/FALSE?..if no then how can we do it?

sprkerro.PNG

0 Kudos
Dataiker
Dataiker
DSS does support Yes/No, 0/1, true/false, etc. But the values need to have the same form on the train set and test set, or in the case of an evaluation recipe, on the original train & test set and on the scored set. Training a model on true/false, then evaluating on Yes/No won't work.

I only see one screenshot, is the from the scored dataset? What were the values of that column for the train & tests sets?
0 Kudos
Level 1

Hi Adrein,

Glad to see quick response from you.

We are still having few concerns and below are our submission on the points provided:

a. Values need to have the same form on the train set and test set, or in the case of an evaluation recipe, on the original train & test set and on the scored set.

--> Yes we are using same form on each and every dataset present in recipe and that is Yes/No. This behaviour is constant in our project. We used dataiku internal engine for processing and yes that worked end to end but we tried to do the things on Spark MLLib, we face the issue on same recipe which we executed successfully.

What were the values of that column for the train & tests sets?

I am attaching the same for train and test. 

Please help us in resolving this.

0 Kudos
Dataiker
Dataiker

I was able to reproduce your issue. It seems that for MLLib engine, if your target is not true/false on a binary classification target, you should force its Meaning to Text in the Script part of the analysis in which you train your model (before training your model).

Another more robust solution would be to use a Prepare recipe on your dataset to change it to true/false.

Screen Shot 2020-05-23 at 11.58.21.png