[2024/01/26-12:40:28.638] [ActivityExecutor-35] [INFO] [dku] running compute_Extracted_text_RBC_HSBC_embedded_NP - ---------------------------------------- [2024/01/26-12:40:28.638] [ActivityExecutor-35] [INFO] [dku] running compute_Extracted_text_RBC_HSBC_embedded_NP - DSS startup: jek version:12.4.2 [2024/01/26-12:40:28.638] [ActivityExecutor-35] [INFO] [dku] running compute_Extracted_text_RBC_HSBC_embedded_NP - DSS home: /Users/akashjoshi/Library/DataScienceStudio/dss_home [2024/01/26-12:40:28.638] [ActivityExecutor-35] [INFO] [dku] running compute_Extracted_text_RBC_HSBC_embedded_NP - OS: Mac OS X 10.16 x86_64 - Java: Temurin 1.8.0_322 [2024/01/26-12:40:28.631] [ActivityExecutor-35] [INFO] [dku.flow.jobrunner] running compute_Extracted_text_RBC_HSBC_embedded_NP - Allocated a slot for this activity! [2024/01/26-12:40:28.639] [ActivityExecutor-35] [INFO] [dku.flow.jobrunner] running compute_Extracted_text_RBC_HSBC_embedded_NP - Run activity [2024/01/26-12:40:28.661] [ActivityExecutor-35] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Executing default pre-activity lifecycle hook [2024/01/26-12:40:28.678] [ActivityExecutor-35] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Checking if sources are ready [2024/01/26-12:40:28.681] [ActivityExecutor-35] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Will check readiness of GENAIUSECASERBCHSBCTRANSITION.Extracted_text_RBC_HSBC p=NP [2024/01/26-12:40:28.710] [ActivityExecutor-35] [INFO] [dku.datasets.file] running compute_Extracted_text_RBC_HSBC_embedded_NP - Building Filesystem handler config: {"connection":"filesystem_managed","path":"GENAIUSECASERBCHSBCTRANSITION/Extracted_text_RBC_HSBC","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/01/26-12:40:28.711] [ActivityExecutor-35] [DEBUG] [dku.datasets.fsbased] running compute_Extracted_text_RBC_HSBC_embedded_NP - getReadiness: will enumerate partition [2024/01/26-12:40:28.712] [ActivityExecutor-35] [INFO] [dku.datasets.ftplike] running compute_Extracted_text_RBC_HSBC_embedded_NP - Enumerating Filesystem dataset prefix= [2024/01/26-12:40:28.712] [ActivityExecutor-35] [DEBUG] [dku.datasets.fsbased] running compute_Extracted_text_RBC_HSBC_embedded_NP - Building FS provider for dataset handler: GENAIUSECASERBCHSBCTRANSITION.Extracted_text_RBC_HSBC [2024/01/26-12:40:28.727] [ActivityExecutor-35] [DEBUG] [dku.datasets.fsbased] running compute_Extracted_text_RBC_HSBC_embedded_NP - FS Provider built [2024/01/26-12:40:28.728] [ActivityExecutor-35] [DEBUG] [dku.fs.local] running compute_Extracted_text_RBC_HSBC_embedded_NP - Enumerating local filesystem prefix=/ [2024/01/26-12:40:28.731] [ActivityExecutor-35] [DEBUG] [dku.fs.local] running compute_Extracted_text_RBC_HSBC_embedded_NP - Enumeration done nb_paths=1 size=1885 [2024/01/26-12:40:28.732] [ActivityExecutor-35] [DEBUG] [dku.datasets.fsbased] running compute_Extracted_text_RBC_HSBC_embedded_NP - getReadiness: enumerated partition, found 1 paths, computing hash [2024/01/26-12:40:28.733] [ActivityExecutor-35] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Checked source readiness GENAIUSECASERBCHSBCTRANSITION.Extracted_text_RBC_HSBC -> true [2024/01/26-12:40:28.734] [ActivityExecutor-35] [DEBUG] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Computing hashes to propagate BEFORE activity [2024/01/26-12:40:28.735] [ActivityExecutor-35] [DEBUG] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Recorded 1 hashes before activity run [2024/01/26-12:40:28.735] [ActivityExecutor-35] [DEBUG] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Building recipe runner of type [2024/01/26-12:40:28.743] [ActivityExecutor-35] [DEBUG] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Recipe runner built, will use 1 thread(s) [2024/01/26-12:40:28.744] [ActivityExecutor-35] [DEBUG] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Preparing execution thread: com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner@5f87b888 [2024/01/26-12:40:28.745] [ActivityExecutor-35] [DEBUG] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Starting execution thread: Thread[Thread-22,5,main] [2024/01/26-12:40:28.745] [ActivityExecutor-35] [DEBUG] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Execution threads started, waiting for activity end [2024/01/26-12:40:28.752] [FRT-39-FlowRunnable] [INFO] [dku.flow.activity] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Run thread for activity compute_Extracted_text_RBC_HSBC_embedded_NP starting [2024/01/26-12:40:28.755] [FRT-39-FlowRunnable] [INFO] [dku.recipes.nlp.rag_embedding] act.compute_Extracted_text_RBC_HSBC_embedded_NP - RAG Embededing recipe runner started [2024/01/26-12:40:28.782] [FRT-39-FlowRunnable] [INFO] [dku.venv.selector] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Select code env lang=PYTHON projectSelection={"mode":"EXPLICIT_ENV","preventOverride":false,"envName":"RAG"} globalDefault=null [2024/01/26-12:40:28.782] [FRT-39-FlowRunnable] [INFO] [dku.recipes.nlp.rag_embedding] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Run embedding in code env RAG [2024/01/26-12:40:28.794] [FRT-39-FlowRunnable] [INFO] [dku.ml.distributed.service] act.compute_Extracted_text_RBC_HSBC_embedded_NP - New worker pool created: pool-4ckiy4ovw3invev9 [2024/01/26-12:40:28.800] [FRT-39-FlowRunnable] [INFO] [dku.datasets.file] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Building Filesystem handler config: {"connection":"filesystem_managed","path":"GENAIUSECASERBCHSBCTRANSITION/Extracted_text_RBC_HSBC","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/01/26-12:40:28.801] [FRT-39-FlowRunnable] [DEBUG] [dku.datasets.fsbased] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Building FS provider for dataset handler: GENAIUSECASERBCHSBCTRANSITION.Extracted_text_RBC_HSBC [2024/01/26-12:40:28.802] [FRT-39-FlowRunnable] [DEBUG] [dku.datasets.fsbased] act.compute_Extracted_text_RBC_HSBC_embedded_NP - FS Provider built [2024/01/26-12:40:28.804] [FRT-39-FlowRunnable] [INFO] [dku.datasets.file] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Building Filesystem handler config: {"connection":"filesystem_managed","path":"GENAIUSECASERBCHSBCTRANSITION/Extracted_text_RBC_HSBC","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/01/26-12:40:28.806] [FRT-39-FlowRunnable] [DEBUG] [dku.datasets.fsbased] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Building FS provider for dataset handler: GENAIUSECASERBCHSBCTRANSITION.Extracted_text_RBC_HSBC [2024/01/26-12:40:28.808] [FRT-39-FlowRunnable] [DEBUG] [dku.datasets.fsbased] act.compute_Extracted_text_RBC_HSBC_embedded_NP - FS Provider built [2024/01/26-12:40:28.832] [FRT-39-FlowRunnable] [INFO] [dku.code.projectLibs] act.compute_Extracted_text_RBC_HSBC_embedded_NP - EXTERNAL LIBS FROM GENAIUSECASERBCHSBCTRANSITION is {"gitReferences":{},"pythonPath":["python"],"rsrcPath":["R"],"importLibrariesFromProjects":[]} [2024/01/26-12:40:28.833] [FRT-39-FlowRunnable] [INFO] [dku.code.projectLibs] act.compute_Extracted_text_RBC_HSBC_embedded_NP - chunkFolder is /projects/GENAIUSECASERBCHSBCTRANSITION/lib/R [2024/01/26-12:40:28.836] [FRT-39-FlowRunnable] [INFO] [dku.recipes.code.base] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Writing dku-exec-env for local execution in /Users/akashjoshi/Library/DataScienceStudio/dss_home/jobs/GENAIUSECASERBCHSBCTRANSITION/Build_Extracted_text_RBC_HSBC_embedded__NP__2024-01-26T17-40-23.228/compute_Extracted_text_RBC_HSBC_embedded_NP/rag-embedding-recipe/pyrun7vG6JPqImX7Q/remote-run-env-def.json [2024/01/26-12:40:28.844] [FRT-39-FlowRunnable] [INFO] [dku.code.envs.resolution] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Executing Python activity in env: RAG [2024/01/26-12:40:28.855] [FRT-39-FlowRunnable] [INFO] [dku.flow.abstract.python] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Execute activity command: ["/Users/akashjoshi/Library/DataScienceStudio/dss_home/code-envs/python/RAG/bin/python","-u","-m","dataiku.llm.rag.rag_embedding_recipe","/Users/akashjoshi/Library/DataScienceStudio/dss_home/knowledge-banks/GENAIUSECASERBCHSBCTRANSITION/ZGFFw4He","GENAIUSECASERBCHSBCTRANSITION.Extracted_text_RBC_HSBC"] [2024/01/26-12:40:28.856] [FRT-39-FlowRunnable] [INFO] [dku.flow.abstract.python] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Attached worker pool to com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner$1 recipe runner: pool-4ckiy4ovw3invev9 [2024/01/26-12:40:28.860] [FRT-39-FlowRunnable] [INFO] [dku.security.process] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Starting process (regular) [2024/01/26-12:40:28.865] [FRT-39-FlowRunnable] [INFO] [dku.security.process] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Process started with pid=76367 [2024/01/26-12:40:28.873] [FRT-39-FlowRunnable] [DEBUG] [dku.resourceusage] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Reporting start of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"GENAIUSECASERBCHSBCTRANSITION","jobId":"Build_Extracted_text_RBC_HSBC_embedded__NP__2024-01-26T17-40-23.228","activityId":"compute_Extracted_text_RBC_HSBC_embedded_NP","activityType":"recipe","recipeType":"nlp_llm_rag_embedding","recipeName":"compute_Extracted_text_RBC_HSBC_embedded"},"type":"LOCAL_PROCESS","id":"t4MeDBKhK1JNfqsm","startTime":1706290828869,"localProcess":{"cpuCurrent":0.0,"cpuAverageOverPast60Seconds":0.0}} [2024/01/26-12:40:28.878] [process-resource-monitor-76367-44] [DEBUG] [dku.resource] - Process stats for pid 76367: {"pid":76367,"commandName":"/Users/akashjoshi/Library/DataScienceStudio/dss_home/code-envs/python/RAG/bin/python","cpuCurrent":0.0,"cpuAverageOverPast60Seconds":0.0,"vmRSSTotalMBS":0} [2024/01/26-12:40:41.769] [null-err-42] [INFO] [dku.utils] - Traceback (most recent call last): [2024/01/26-12:40:41.773] [null-err-42] [INFO] [dku.utils] - File "/Users/akashjoshi/Library/DataScienceStudio/Python/python3.7-20220516/lib/python3.7/runpy.py", line 193, in _run_module_as_main [2024/01/26-12:40:41.774] [null-err-42] [INFO] [dku.utils] - "__main__", mod_spec) [2024/01/26-12:40:41.774] [null-err-42] [INFO] [dku.utils] - File "/Users/akashjoshi/Library/DataScienceStudio/Python/python3.7-20220516/lib/python3.7/runpy.py", line 85, in _run_code [2024/01/26-12:40:41.775] [null-err-42] [INFO] [dku.utils] - exec(code, run_globals) [2024/01/26-12:40:41.775] [null-err-42] [INFO] [dku.utils] - File "/Users/akashjoshi/Library/DataScienceStudio/kits/dataiku-dss-12.4.2-osx/python/dataiku/llm/rag/rag_embedding_recipe.py", line 12, in [2024/01/26-12:40:41.776] [null-err-42] [INFO] [dku.utils] - from langchain.vectorstores import FAISS, Pinecone, Chroma [2024/01/26-12:40:41.776] [null-err-42] [INFO] [dku.utils] - ImportError: cannot import name 'Pinecone' from 'langchain.vectorstores' (/Users/akashjoshi/Library/DataScienceStudio/dss_home/code-envs/python/RAG/lib/python3.7/site-packages/langchain/vectorstores/__init__.py) [2024/01/26-12:40:41.836] [FRT-39-FlowRunnable] [DEBUG] [dku.resourceusage] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"GENAIUSECASERBCHSBCTRANSITION","jobId":"Build_Extracted_text_RBC_HSBC_embedded__NP__2024-01-26T17-40-23.228","activityId":"compute_Extracted_text_RBC_HSBC_embedded_NP","activityType":"recipe","recipeType":"nlp_llm_rag_embedding","recipeName":"compute_Extracted_text_RBC_HSBC_embedded"},"type":"LOCAL_PROCESS","id":"t4MeDBKhK1JNfqsm","startTime":1706290828869,"localProcess":{"pid":76367,"commandName":"/Users/akashjoshi/Library/DataScienceStudio/dss_home/code-envs/python/RAG/bin/python","cpuCurrent":0.0,"cpuAverageOverPast60Seconds":0.0,"vmRSSTotalMBS":0}} [2024/01/26-12:40:41.843] [FRT-39-FlowRunnable] [INFO] [dip.exec.resultHandler] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Did not find a specific error from error files or logs, falling back on return code [2024/01/26-12:40:41.851] [FRT-39-FlowRunnable] [INFO] [dku.ml.distributed.pool] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Closing worker pool pool-4ckiy4ovw3invev9 [2024/01/26-12:40:41.851] [FRT-39-FlowRunnable] [INFO] [dku.ml.distributed.service] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Unregistered worker pool: pool-4ckiy4ovw3invev9 [2024/01/26-12:40:41.859] [FRT-39-FlowRunnable] [INFO] [dku.flow.activity] act.compute_Extracted_text_RBC_HSBC_embedded_NP - Run thread failed for activity compute_Extracted_text_RBC_HSBC_embedded_NP com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:29) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:70) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:99) at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner$1.run(RAGEmbeddingRecipeRunner.java:124) at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner.run(RAGEmbeddingRecipeRunner.java:104) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2024/01/26-12:40:42.010] [ActivityExecutor-35] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - activity is finished [2024/01/26-12:40:42.013] [ActivityExecutor-35] [ERROR] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Activity failed com.dataiku.dip.exceptions.ProcessDiedException: The Python process failed (exit code: 1). More info might be available in the logs. at com.dataiku.dip.dataflow.common.CodeBasedThingHelper.throwSubprocessError(CodeBasedThingHelper.java:23) at com.dataiku.dip.dataflow.exec.JobExecutionResultHandler.handleExecutionResult(JobExecutionResultHandler.java:29) at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:70) at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeModule(AbstractPythonRecipeRunner.java:99) at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner$1.run(RAGEmbeddingRecipeRunner.java:124) at com.dataiku.dip.recipes.nlp.rag_embedding.RAGEmbeddingRecipeRunner.run(RAGEmbeddingRecipeRunner.java:104) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2024/01/26-12:40:42.014] [ActivityExecutor-35] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Executing default post-activity lifecycle hook [2024/01/26-12:40:42.052] [ActivityExecutor-35] [INFO] [dku.flow.activity] running compute_Extracted_text_RBC_HSBC_embedded_NP - Done post-activity tasks