Running DSS Jupyternotebook with Python3+

Roger_Yu · September 2020

Hi:

When install the DataScienceStudio app and running python script in jupyter notebook, I find that the notebook kernel is using python 2 version. I try to upgrading the python version based on the documentation: https://doc.dataiku.com/dss/latest/installation/python.html, but after upgrade, migration and restart the DSS service, I find the python kernel is still python 2, I'm wondering what is happening?

What I'm doing is like this:

1. Stop the local service by

XXXX/Library/DataScienceStudio/dss_home/bin/dss stop

2. Using the command to upgrade

XXXX/installer.sh -u -d XXXX/Library/DataScienceStudio/dss_home -C /Users/rogeryu/opt/anaconda3/bin/python3

3. Wait for the migration to be finished sucessfully

4. Start the DSS service like XXXX/Library/DataScienceStudio/dss_home/bin/dss start

5. Reopen the jupyter notbook and find kernel is still python 2

6. Go to the data dir to check the following sub-dir, all python version is still python 2

cd /XXXX/Library/DataScienceStudio/dss_home/condaenv/bin/ && ls -ltrF | grep -i python

Sergey · September 2020

Hi,

This is because the path to the python binary you have used (/Users/rogeryu/opt/anaconda3/bin/python3) will be ignored as DSS creates its own conda_env using python2.7 just based on -C parameter. And before rebuilding python env, you need to move existing datadir/pyenv to another location, because otherwise there will be no actual rebuild (we do check if the dir is present).

Moving forward, you cannot use -C and -P python3.6 parameters at the same time as using Anaconda Python for the builtin environment is only supported with Python 2.7 as stated in the doc you mentioned.

The last recommendation is not to switch base python version on existing DSS instances as this might break the user's code. For just installed DSS that's fine though.

With having all this said, the solution for this case will be to create conda 3.6 code_env as stated here and use it in Jupiter notebooks.

Roger_Yu · September 2020

Hi Sergeyd:

I find the place to setup the condo-env isolated in DSS system, thank you. Now I met a new problem. I'm trying to use SMBO in hyper parameter tuning. When I selected the Bayesian search, and I define a customized CV method like stratifiedKFold or repeatedStratifiedKFold, I find the autoML cannot work correctly and it consistently ask me to provide a 'cv' variable (But default K-Fold method works well). I think it can be a bug, would you please also have a try on it? Thank you very much

Roger_Yu · September 2020

Add more debug information, I tried to debug like the attached log-file:

/Applications/DataScienceStudio.app/Contents/Resources/kit/python/dataiku/doctor/prediction/common.py

Line 817, in command logger.info("Using custom CV: %s" % cv), the cv variable cannot be found, I try to add debug before it like this:

code = grid_search_params["code"]
try:
    logger.info(code)
    exec (code)
except Exception as e:
    logger.error(repr(e))

And I found mystery happen: the code actually can be printed, as the attached file shows, and exec command try/except can pass, but after that cv variable is totally missed, and I try run in my commuter for getting the cv, everything seems to work exactly well

Nicolas_Servel · September 2020

Hello Roger,

Thanks for reporting this.

After some investigation, we confirm that using custom code cross validation strategy does not work with python 3.

As you suspected, this is due to the exec call, that is very brittle, and in particular, that has a different behaviour in python 2 and in python 3.

I'll be passing the issue to the R&D team, which will work on fixing it for a future version of DSS.

Hope this helps,

Best regards,

Roger_Yu · September 2020

ok, thanks a lot. I think it is already enough for using the default cross validation method for stratifiedKfold. If possible, please also kindly help to let me know if you find the root cause, thanks a lot. I tried dir(), the variable is exactly in the memory but only cannot be called.

Running DSS Jupyternotebook with Python3+

Answers

Categories

Setup Info

Tags