Issues with pandas version
Hello, I am evaluating DSS.
I am not expert of it, I am trying to load a project that was previously created by some colleagues in a previous version, I think it was 4.0
I am able to import the project (which contains hadoop and spark steps). The problem is when I try do build all the flow.
I am receiving this error:11:54:31] [INFO] [dku.utils] - raise Exception("Base package %s is too recent: version %s was found. %s. You should not install overriding versions of DSS base packages. Run '$DATADIR/bin/pip uninstall %s'" % (name, p.__version__, error_details, name)) [11:54:31] [INFO] [dku.utils] - Exception: Base package pandas is too recent: version 0.23.0 was found. Expected version 0.20.X. You should not install overriding versions of DSS base packages. Run '$DATADIR/bin/pip uninstall pandas' [11:54:31] [INFO] [dku.flow.activity] - Run thread failed for activity compute_Elbow_Table_NP com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.throwSubprocessError(AbstractCodeBasedActivityRunner.java:373) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:363) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:276) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:32) at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:56) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:352) [11:54:31] [INFO] [dku.flow.activity] running compute_Elbow_Table_NP - activity is finished [11:54:31] [ERROR] [dku.flow.activity] running compute_Elbow_Table_NP - Activity failed com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.throwSubprocessError(AbstractCodeBasedActivityRunner.java:373) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:363) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:276) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:32) at com.dataiku.dip.recipes.code.python.PythonRecipeRunner.run(PythonRecipeRunner.java:56) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:352)
Do you have any suggestion ?
Thanks, Bill
Answers
-
Hi Bill,
(I'm a big fan of your movies)
It seems someone made a change to the version of pandas installed in the built-in Python environment of Dataiku. As indicated in the error message, Dataiku requires pandas version 0.20.X to work. Can you or your Dataiku admin run the following Shell command:
$DATADIR/bin/pip uninstall pandas
Where $DATADIR is the data directory of your Dataiku DSS node, see https://www.dataiku.com/learn/guide/getting-started/dss-concepts/the-dss-datadir.html
Cheers,
Alex
-
Hi, thanks for your appreciation!
In fact I act as admin here, we are just testing.
I already tried to uninstall pandas, successfully, and installing the version asked by the error code, which is 0.20, unsuccessfully.
It says it cannot install the version I am trying to due to some file missing.
You can find the full message, containing the log here https://pastebin.com/HhdwaTLh
Thanks for your help! -
Hmmm. The logs indicates that some system packages are missing. You need to install system development tools and the Python interpreter header files. More info:
https://doc.dataiku.com/dss/latest/installation/python.html#additional-prerequisites -
Ok, thanks.
I installed the develpment tools but since they were slow on install, I went for a coffee.
Unfortunately, power went out. I restarted the system but now it seems that DSS's internat DB is corrupted.
In backend.log I can find an infinite stack trace, however the interesting message in my opinion is
Caused by: java.lang.IllegalStateException: Reading from nio:/opt/dataiku/databases/flow_state.mv.db failed; file length 499712 read length 1024 at 509121 [1.4.195/1]
How can I restore/discard this DB ?
Thanks -
Hi, You can stop DSS, remove the corrupted DB and start DSS again.
-
Hi, did you solve your issue?