Python process failed during building knowledge bank

Ajoshi005
Ajoshi005 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3
edited July 16 in Using Dataiku

I am trying to build a knowledge bank with PDF docs(about 2pages) using OpenAI embedding model ADA-002. I am using the visual recipes (text splitter) which runs succesfully. But while running the the embedding recipe(build knowledge bank) I am getting this error msg

Oops: an unexpected error occurred

The Python process failed (exit code: 1). More info might be available in the logs.

Please see our options for getting help

Logs:-

[11:37:43] [INFO] [dip.exec.resultHandler] - Did not find a specific error from error files or logs, falling back on return code
[11:37:43] [INFO] [dku.ml.distributed.pool] - Closing worker pool pool-pge76po0rmu1ajbt
[11:37:43] [INFO] [dku.ml.distributed.service] - Unregistered worker pool: pool-pge76po0rmu1ajbt
[11:37:43] [INFO] [dku.flow.activity] - Run thread failed for activity compute_Extracted_text_RBC_HSBC_embedded_1_NP
com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs.
 at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:29)
 at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:70)
 at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:99)
 at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner$1.run(RAGEmbeddingRecipeRunner.java:124)
 at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner.run(RAGEmbeddingRecipeRunner.java:104)
 at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[11:37:43] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_1_NP - activity is finished
[11:37:43] [ERROR] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_1_NP - Activity failed
com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs.
 at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23)
 at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:29)
 at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:70)
 at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:99)
 at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner$1.run(RAGEmbeddingRecipeRunner.java:124)
 at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner.run(RAGEmbeddingRecipeRunner.java:104)
 at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[11:37:43] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_1_NP - Executing default post-activity lifecycle hook
[11:37:43] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_1_NP - Done post-activity tasks

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker

    Hi @Ajoshi005
    ,

    Unfortunately, the logs you captured here are not verbose enough to serve as a meaningful starting point for troubleshooting. However, as noted, more info might be available in the logs. I would recommend navigating to your project -> Jobs -> select the embedding job -> Actions -> View full job log. Working up from the bottom of the logs where the job fails, please scan the logs for any helpful hints, such as code env package version errors.

    Screenshot 2024-01-26 at 11.56.14 AM.png

    Please feel free to provide any additional logs that you think may be helpful.

    Thanks!

  • Ajoshi005
    Ajoshi005 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3
    edited July 17

    Thank you for the quick response. I realised that I was running the Embed recipe without installing a RAG Python environment. However I am unable to find the RAG python 3.9 env while trying to create. The options available are python 3.7 and python 3.11(experimental), both of which dont have option to add RAG packages. I tried to install the RAG packages (langchain,pinecone,Faiss,etc) as shown in dataiku tutorial in the 3.7 env and it still shows an error. this time the error log is as attached. The error looks to be the

    [2024/01/26-12:40:41.776] [null-err-42] [INFO] [dku.utils] - from langchain.vectorstores import FAISS, Pinecone, Chroma
    [2024/01/26-12:40:41.776] [null-err-42] [INFO] [dku.utils] - ImportError: cannot import name 'Pinecone' from 'langchain.vectorstores' (/Users/akashjoshi/Library/DataScienceStudio/dss_home/code-envs/python/RAG/lib/python3.7/site-packages/langchain/vectorstores/__init__.py)

     

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,059 Neuron

    As a side note this thread covers how to get different Python interpreters, like Python 3.9, enabled in Dataiku:

    https://community.dataiku.com/t5/Setup-Configuration/Best-Practices-for-Updating-Python/m-p/38870

    This may be a better way of solving your problem.

  • Ajoshi005
    Ajoshi005 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Registered Posts: 3

    I was able to resolve the issue. Might be helpful for anyone trying to implement RAG in trial version of DSS(12.5). So here is what I did:-

    - Under Code Env I installed the python 3.11(experimental) version.

    - Select Core packages version: Pandas1.5 (python 3.8 and bove)

    - added the RAG packages through Add sets of packages option

    (langchain==0.0.270
    pydantic<2
    chromadb
    faiss-cpu
    pinecone-client)

Setup Info
    Tags
      Help me…