Job failed: Error in python process: At line 16: Python process is running remotely

Ashlin · March 2022

Dear Experts,

currently using Dataiku online. I am trying to read the cleaned dataset , train it and get the model stored in the folder

I am new in using Dataiku api, would like to take help here .

Then use it in dataiku to test and create api.

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import gensim
import nltk
from gensim.models import Word2Vec
from nltk import word_tokenize



# Read recipe inputs
CleansedData = dataiku.Folder("YEM8QpBl")
#CleansedData_info = CleansedData.get_info()
Source_Path = CleansedData.get_path()
path_Of_CSV = os.path.join(folder_path, "CleanDataForSentence2Vec.csv")  
df = pd.read_csv(path_of_csv)#Problem here
# get array of titles
titles = df['title'].values.tolist()

# tokenize the each title
tok_titles = [word_tokenize(title) for title in titles]

# refer to here for all parameters:
# https://radimrehurek.com/gensim/models/word2vec.html
model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4,
                 iter=100)


#model.save('./data/job_titles.model')


# Write recipe outputs #Problem here too
Model = dataiku.Folder("rlACbXYw");
path = Model.get_path();
model.save(path/'job_titles.model')
        
        
Model_info = Model.get_info()

Please find the code above!

Do suggest the needful!

Br

Ash

Alexandru · March 2022

Hi Ash,

Based on the context on another channel what is happening here is your managed folders are not local.

This means you will need to use managed folder read/write APIs instead. e.g get_download_stream and upload_stream. Please see some suggested changes in the code below.

https://knowledge.dataiku.com/latest/courses/folders/managed-folders.html

Let me know if that works for you!

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import gensim
import nltk
from gensim.models import Word2Vec
from nltk import word_tokenize
from io import BytesIO


# Read recipe inputs
CleansedData = dataiku.Folder("YEM8QpBl")

#CleansedData_info = CleansedData.get_info()
Source_Path = CleansedData.get_path()

# you can also use  list_paths_in_partition()

# change to something like 
with CleansedData.get_download_stream("/CleanDataForSentence2Vec.csv) as stream:
     df = pd.read_csv(stream)

# get array of titles
titles = df['title'].values.tolist()

# tokenize the each title
tok_titles = [word_tokenize(title) for title in titles]

# refer to here for all parameters:
# https://radimrehurek.com/gensim/models/word2vec.html
model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4,
                 iter=100)


#model.save('./data/job_titles.model')


# Write recipe outputs #Problem here too
#change this part to something like 
Model = dataiku.Folder("rlACbXYw")
path = Model.get_path()

bytes_container = BytesIO()
model.save(bytes_container)
bytes_container.seek(0)

Model.upload_stream("saved_model.model", bytes_container)        
        
Model_info = Model.get_info()

Ashlin · March 2022

Hello @AlexT
,

Thanks it worked! I have few more questions

1) can we reverse the process? I want to load the model back from the folder! How to achieve it? same as streams? Can you share a sample snippet

2) Can I publish the model in to model repository with code? Snippet please

Thanks and Regards,

Gabriel

Model_Folder.upload_stream("saved_model.model", bytes_container)

Timmy · April 2023

HI, I'm really interested in how to load back the pretrained model after saving to folder like this. I try many way but no luck

Alexandru · April 2023

Hi @Timmy
, @Ashlin
,
Sorry for the delayed response.

I would suggest you have a look at the new capabilities and consider leveraging mlflow models you can export/ import and use in DS :

https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-as-a-mlflow-model

https://doc.dataiku.com/dss/latest/mlops/mlflow-models/importing.html

There are also export to python capabilities now that would allow using

https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-to-python

https://developer.dataiku.com/latest/tutorials/machine-learning/model-export/python-export-cli/index.html

Also, just generally, if you have a new questions, it's always best to start a new thread as an already solved thread may be missed.

Timmy · April 2023

Hi, I can fix it now. I can load the trained word embedding model as pickle file.

Job failed: Error in python process: At line 16: Python process is running remotely

Best Answer

Answers

Categories

Setup Info

Tags