Hello, I am evaluating DSS.
I am not expert of it, I am trying to load a project that was previously created by some colleagues in a previous version, I think it was 4.0
I am able to import the project (which contains hadoop and spark steps). The problem is when I try do build all the flow.
I am receiving this error:11:54:31] [INFO] [dku.utils] - raise Exception("Base package %s is too recent: version %s was found. %s. You should not install overriding versions of DSS base packages. Run '$DATADIR/bin/pip uninstall %s'" % (name, p.__version__, error_details, name)) [11:54:31] [INFO] [dku.utils] - Exception: Base package pandas is too recent: version 0.23.0 was found. Expected version 0.20.X. You should not install overriding versions of DSS base packages. Run '$DATADIR/bin/pip uninstall pandas' [11:54:31] [INFO] [dku.flow.activity] - Run thread failed for activity compute_Elbow_Table_NP com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.throwSubprocessError(AbstractCodeBasedActivityRunner.java:373) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:363) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:276) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:32) at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:56) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:352) [11:54:31] [INFO] [dku.flow.activity] running compute_Elbow_Table_NP - activity is finished [11:54:31] [ERROR] [dku.flow.activity] running compute_Elbow_Table_NP - Activity failed com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.throwSubprocessError(AbstractCodeBasedActivityRunner.java:373) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:363) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:276) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:32) at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:56) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:352)
Do you have any suggestion ?
Thanks, Bill
Hi Bill,
(I'm a big fan of your movies)
It seems someone made a change to the version of pandas installed in the built-in Python environment of Dataiku. As indicated in the error message, Dataiku requires pandas version 0.20.X to work. Can you or your Dataiku admin run the following Shell command:
$DATADIR/bin/pip uninstall pandas
Where $DATADIR is the data directory of your Dataiku DSS node, see https://www.dataiku.com/learn/guide/getting-started/dss-concepts/the-dss-datadir.html
Cheers,
Alex