ML Process died (exit code 139)

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker

From the log:

python(5787,0x70000fcc5000) malloc: *** error for object 0x7ff8fe7317e0: incorrect checksum for freed object - object was probably modified after being freed.

[2018/01/24-14:07:02.303] [Exec-158] [INFO] [dku.utils] - *** set a breakpoint in malloc_error_break to debug

[2018/01/24-14:07:02.304] [Kernel-159-monitor-159] [INFO] [dku.kernels] - Process done with code 134

[2018/01/24-14:07:02.408] [MRT-153] [ERROR] [dku.analysis.prediction] - Processing failed

com.dataiku.dip.exceptions.ProcessDiedException: ML process died (exit code: 134)

at com.dataiku.dip.exceptions.ProcessDiedException.getExceptionOnProcessDeath(ProcessDiedException.java:46)

at com.dataiku.dip.kernels.DSSKernelBase.getExceptionOnProcessDeath(DSSKernelBase.java:129)

at com.dataiku.dip.analysis.coreservices.AnalysisMLKernel.executeCommand(AnalysisMLKernel.java:105)

at com.dataiku.dip.analysis.ml.prediction.PyRegularNoSavePredictionHandler$TrainAdditionalThread.process(PyRegularNoSavePredictionHandler.java:118)

at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:66)

****

Is this a caching issue?

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi,

    This looks like a memory corrumption bug in one of the underlying numerical computation libraries (numpy, pandas, blas,....). Is it reproducible ? Reproducible with other algorithms on this dataset ? Could you share details about your setup ? Are you at a liberty to share this dataset ?
Setup Info
    Tags
      Help me…