Job failed: Error in python process: At line 16: Python process is running remotely

Solved!
Ashlin
Level 1
Job failed: Error in python process: At line 16: Python process is running remotely

Dear Experts,

 

currently using Dataiku online.  I am trying to read the cleaned dataset , train it and get the model stored in the folder 

I am new in using Dataiku api, would like to take help here .

 

Then use it in dataiku to test and create api.

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import gensim
import nltk
from gensim.models import Word2Vec
from nltk import word_tokenize



# Read recipe inputs
CleansedData = dataiku.Folder("YEM8QpBl")
#CleansedData_info = CleansedData.get_info()
Source_Path = CleansedData.get_path()
path_Of_CSV = os.path.join(folder_path, "CleanDataForSentence2Vec.csv")  
df = pd.read_csv(path_of_csv)#Problem here
# get array of titles
titles = df['title'].values.tolist()

# tokenize the each title
tok_titles = [word_tokenize(title) for title in titles]

# refer to here for all parameters:
# https://radimrehurek.com/gensim/models/word2vec.html
model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4,
                 iter=100)


#model.save('./data/job_titles.model')


# Write recipe outputs #Problem here too
Model = dataiku.Folder("rlACbXYw");
path = Model.get_path();
model.save(path/'job_titles.model')
        
        
Model_info = Model.get_info()

 

Please find the code above!

Do suggest the needful!

Br

Ash 

0 Kudos
1 Solution
AlexT
Dataiker

Hi Ash, 

Based on the context on another channel what is happening here is your managed folders are not local.

This means you will need to use managed folder read/write APIs instead. e.g get_download_stream and upload_stream. Please see some suggested changes in the code below. 

https://knowledge.dataiku.com/latest/courses/folders/managed-folders.html

 

Let me know if that works for you!

 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import gensim
import nltk
from gensim.models import Word2Vec
from nltk import word_tokenize
from io import BytesIO


# Read recipe inputs
CleansedData = dataiku.Folder("YEM8QpBl")

#CleansedData_info = CleansedData.get_info()
Source_Path = CleansedData.get_path()

# you can also use  list_paths_in_partition()

# change to something like 
with CleansedData.get_download_stream("/CleanDataForSentence2Vec.csv) as stream:
     df = pd.read_csv(stream)

# get array of titles
titles = df['title'].values.tolist()

# tokenize the each title
tok_titles = [word_tokenize(title) for title in titles]

# refer to here for all parameters:
# https://radimrehurek.com/gensim/models/word2vec.html
model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4,
                 iter=100)


#model.save('./data/job_titles.model')


# Write recipe outputs #Problem here too
#change this part to something like 
Model = dataiku.Folder("rlACbXYw")
path = Model.get_path()

bytes_container = BytesIO()
model.save(bytes_container)
bytes_container.seek(0)

Model.upload_stream("saved_model.model", bytes_container)        
        
Model_info = Model.get_info()

 

 

View solution in original post

5 Replies
AlexT
Dataiker

Hi Ash, 

Based on the context on another channel what is happening here is your managed folders are not local.

This means you will need to use managed folder read/write APIs instead. e.g get_download_stream and upload_stream. Please see some suggested changes in the code below. 

https://knowledge.dataiku.com/latest/courses/folders/managed-folders.html

 

Let me know if that works for you!

 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import gensim
import nltk
from gensim.models import Word2Vec
from nltk import word_tokenize
from io import BytesIO


# Read recipe inputs
CleansedData = dataiku.Folder("YEM8QpBl")

#CleansedData_info = CleansedData.get_info()
Source_Path = CleansedData.get_path()

# you can also use  list_paths_in_partition()

# change to something like 
with CleansedData.get_download_stream("/CleanDataForSentence2Vec.csv) as stream:
     df = pd.read_csv(stream)

# get array of titles
titles = df['title'].values.tolist()

# tokenize the each title
tok_titles = [word_tokenize(title) for title in titles]

# refer to here for all parameters:
# https://radimrehurek.com/gensim/models/word2vec.html
model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4,
                 iter=100)


#model.save('./data/job_titles.model')


# Write recipe outputs #Problem here too
#change this part to something like 
Model = dataiku.Folder("rlACbXYw")
path = Model.get_path()

bytes_container = BytesIO()
model.save(bytes_container)
bytes_container.seek(0)

Model.upload_stream("saved_model.model", bytes_container)        
        
Model_info = Model.get_info()

 

 

Ashlin
Level 1
Author

Hello @AlexT ,

 

Thanks it worked!   I have few more questions

 

1) can we reverse the process? I want to load the model back from the folder! How to achieve it? same as streams? Can you share a sample snippet

2) Can I publish the model in to model repository with code? Snippet please

Thanks and Regards,

Gabriel

 

Model_Folder.upload_stream("saved_model.model", bytes_container)

0 Kudos
Timmy
Level 2

HI, I'm really interested in how to load back the pretrained model after saving to folder like this. I try many way but no luck

0 Kudos
AlexT
Dataiker

Hi @Timmy , @Ashlin ,
Sorry for the delayed response.

I would suggest you have a look at the new capabilities and consider leveraging mlflow models you can export/ import and use in DS :

https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-as-a-mlflow-model


https://doc.dataiku.com/dss/latest/mlops/mlflow-models/importing.html

There are also export to python capabilities now that would allow using 

https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-to-python

https://developer.dataiku.com/latest/tutorials/machine-learning/model-export/python-export-cli/index...

Also, just generally, if you have a new questions, it's always best to start a new thread as an already solved thread may be missed. 

 

0 Kudos
Timmy
Level 2

Hi, I can fix it now. I can load the trained word embedding model as pickle file. 

0 Kudos