Python recipe failing in a Python env
Hey guys!
I set a Python env in order to install specific packages, but the following recipe is failing when executed in such an env:
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs df_train_asis = dataiku.Dataset("df_train_asis") df_train_asis_df = df_train_asis.get_dataframe() df_test_asis = dataiku.Dataset("df_test_asis") df_test_asis_df = df_test_asis.get_dataframe() # Compute recipe outputs # TODO: Write here your actual code that computes the outputs # NB: DSS supports several kinds of APIs for reading and writing data. Please see doc. df_results_asis_df = df_train_asis_df # Compute a Pandas dataframe to write into df_results_asis # Write recipe outputs df_results_asis = dataiku.Dataset("df_results_asis") df_results_asis.write_with_schema(df_results_asis_df)
The error message:
Job failed: The Python process failed (exit code: 1)
Error type:com.dataiku.dip.exceptions.ProcessDiedException
The script works fine when run in the builtin DSS env. Any idea?
Operating system used: MacOS
Operating system used: MacOS
Best Answer
-
I tried in a online Dataiku license and got the same, but the problem was the Python version indeed: only 3.6 works properly.
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,909 Neuron
The recipe looks fine, how exactly did you setup the Python environment? What Python version is using?
-
I'm using Python 3.9. The only different thing I set was to insert joblib, category_encoders and imblearn in Requested packages (Pip), and everything went well in installation. Current installed packages:
backcall==0.2.0 category-encoders==2.6.2 certifi==2023.7.22 charset-normalizer==3.3.0 decorator==5.1.1 idna==3.4 imbalanced-learn==0.11.0 imblearn==0.0 ipykernel==4.8.2 ipython==7.34.0 ipython-genutils==0.2.0 jedi==0.19.1 joblib==1.3.2 jupyter-client==5.2.4 jupyter_core==4.11.2 matplotlib-inline==0.1.6 numpy==1.23.5 packaging==23.2 pandas==1.1.5 parso==0.8.3 patsy==0.5.3 pexpect==4.8.0 pickleshare==0.7.5 prompt-toolkit==3.0.39 ptyprocess==0.7.0 Pygments==2.16.1 python-dateutil==2.8.1 pytz==2020.5 pyzmq==22.3.0 requests==2.31.0 scikit-learn==1.3.1 scipy==1.11.3 simplegeneric==0.8.1 six==1.16.0 statsmodels==0.14.0 threadpoolctl==3.2.0 tornado==5.1.1 traitlets==4.3.3 urllib3==2.0.6 wcwidth==0.2.8
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,909 Neuron
Make sure python 3.9 is allowed to open in MacOS Settings
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,909 Neuron
How did you install Python 3.9 in your Mac?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,909 Neuron
I ran this recipe on my Mac on 3.9 and works fine. Your code doesn't look bad to me. Can you remove one of the inputs to test?
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs customers = dataiku.Dataset("customers") customers_df = customers.get_dataframe() # Compute recipe outputs from inputs # TODO: Replace this part by your actual code that computes the output, as a Pandas dataframe # NB: DSS also supports other kinds of APIs for reading and writing data. Please see doc. customers_out_df = customers_df # For this sample code, simply copy input to output # Write recipe outputs customers_out = dataiku.Dataset("customers_out") customers_out.write_with_schema(customers_out_df)
-
Done, I got the same. Code:
# -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu # Read recipe inputs df_train_asis = dataiku.Dataset("df_train_asis") df_train_asis_df = df_train_asis.get_dataframe() # Compute recipe outputs # TODO: Write here your actual code that computes the outputs # NB: DSS supports several kinds of APIs for reading and writing data. Please see doc. df_check_env_df = df_train_asis_df # Compute a Pandas dataframe to write into df_check_env # Write recipe outputs df_check_env = dataiku.Dataset("df_check_env") df_check_env.write_with_schema(df_check_env_df)
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,909 Neuron
What connection are your datasets on?
-
@Turribeach
Amazon S3. -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,909 Neuron
What about if you move your datasets to a file system connection?
-
@Turribeach
I'm not able to do that, I'm using a third-part Dataiku instance. Is it possible that the instance is not set to work properly with Python version > 3.6? -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,909 Neuron
Anything is possible but without being an Admin you can't really solve anything. I suggest you take up the issue with your Dataiku Administrator.