Error while executing spark

Swapnali · May 2020

We are getting following errors while exectuing Spark

Clément_Stenac · May 2020

Hi,

You would need to attach your entire log and/or job diag, not just a part of the error message.

If it is not possible for confidentiality reasons, please submit this as a support ticket, with an attached job diagnosis. Please see https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html for more details.

Swapnali · May 2020

there is a log . Kindly help us ..thanks in advance

AdrienL · May 2020

Hello,

In your test set for this machine learning task, do you happen to have "false" and/or "true" as target for the prediction?
If no, are the schemas or format of your train dataset and test dataset different, especially around the target column?
If no, on each of those datasets, can you click on the target column > Analyse and send us a screenshot for both?

Swapnali · May 2020

Thanks for your reply.

yes we are using True /false prediction. Does it not support TRUE/FALSE?..if no then how can we do it?

AdrienL · May 2020

DSS does support Yes/No, 0/1, true/false, etc. But the values need to have the same form on the train set and test set, or in the case of an evaluation recipe, on the original train & test set and on the scored set. Training a model on true/false, then evaluating on Yes/No won't work.

I only see one screenshot, is the from the scored dataset? What were the values of that column for the train & tests sets?

Amit_Singh · May 2020

Hi Adrein,

Glad to see quick response from you.

We are still having few concerns and below are our submission on the points provided:

a. Values need to have the same form on the train set and test set, or in the case of an evaluation recipe, on the original train & test set and on the scored set.

--> Yes we are using same form on each and every dataset present in recipe and that is Yes/No. This behaviour is constant in our project. We used dataiku internal engine for processing and yes that worked end to end but we tried to do the things on Spark MLLib, we face the issue on same recipe which we executed successfully.

What were the values of that column for the train & tests sets?

I am attaching the same for train and test.

Please help us in resolving this.

AdrienL · May 2020

I was able to reproduce your issue. It seems that for MLLib engine, if your target is not true/false on a binary classification target, you should force its Meaning to Text in the Script part of the analysis in which you train your model (before training your model).

Another more robust solution would be to use a Prepare recipe on your dataset to change it to true/false.

Screen Shot 2020-05-23 at 11.58.21.png

Error while executing spark

Answers

Categories

Setup Info

Tags