Job failed: Error in python process: At line 16: Python process is running remotely
Dear Experts,
currently using Dataiku online. I am trying to read the cleaned dataset , train it and get the model stored in the folder
I am new in using Dataiku api, would like to take help here .
Then use it in dataiku to test and create api.
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE # -*- coding: utf-8 -*- import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu import gensim import nltk from gensim.models import Word2Vec from nltk import word_tokenize # Read recipe inputs CleansedData = dataiku.Folder("YEM8QpBl") #CleansedData_info = CleansedData.get_info() Source_Path = CleansedData.get_path() path_Of_CSV = os.path.join(folder_path, "CleanDataForSentence2Vec.csv") df = pd.read_csv(path_of_csv)#Problem here # get array of titles titles = df['title'].values.tolist() # tokenize the each title tok_titles = [word_tokenize(title) for title in titles] # refer to here for all parameters: # https://radimrehurek.com/gensim/models/word2vec.html model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4, iter=100) #model.save('./data/job_titles.model') # Write recipe outputs #Problem here too Model = dataiku.Folder("rlACbXYw"); path = Model.get_path(); model.save(path/'job_titles.model') Model_info = Model.get_info()
Please find the code above!
Do suggest the needful!
Br
Ash
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi Ash,
Based on the context on another channel what is happening here is your managed folders are not local.
This means you will need to use managed folder read/write APIs instead. e.g get_download_stream and upload_stream. Please see some suggested changes in the code below.
https://knowledge.dataiku.com/latest/courses/folders/managed-folders.html
Let me know if that works for you!
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu import gensim import nltk from gensim.models import Word2Vec from nltk import word_tokenize from io import BytesIO # Read recipe inputs CleansedData = dataiku.Folder("YEM8QpBl") #CleansedData_info = CleansedData.get_info() Source_Path = CleansedData.get_path() # you can also use list_paths_in_partition() # change to something like with CleansedData.get_download_stream("/CleanDataForSentence2Vec.csv) as stream: df = pd.read_csv(stream) # get array of titles titles = df['title'].values.tolist() # tokenize the each title tok_titles = [word_tokenize(title) for title in titles] # refer to here for all parameters: # https://radimrehurek.com/gensim/models/word2vec.html model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4, iter=100) #model.save('./data/job_titles.model') # Write recipe outputs #Problem here too #change this part to something like Model = dataiku.Folder("rlACbXYw") path = Model.get_path() bytes_container = BytesIO() model.save(bytes_container) bytes_container.seek(0) Model.upload_stream("saved_model.model", bytes_container) Model_info = Model.get_info()
Answers
-
Hello @AlexT
,Thanks it worked! I have few more questions
1) can we reverse the process? I want to load the model back from the folder! How to achieve it? same as streams? Can you share a sample snippet
2) Can I publish the model in to model repository with code? Snippet please
Thanks and Regards,
Gabriel
Model_Folder.upload_stream("saved_model.model", bytes_container)
-
HI, I'm really interested in how to load back the pretrained model after saving to folder like this. I try many way but no luck
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi @Timmy
, @Ashlin
,
Sorry for the delayed response.
I would suggest you have a look at the new capabilities and consider leveraging mlflow models you can export/ import and use in DS :https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-as-a-mlflow-model
https://doc.dataiku.com/dss/latest/mlops/mlflow-models/importing.html
There are also export to python capabilities now that would allow using
https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-to-pythonhttps://developer.dataiku.com/latest/tutorials/machine-learning/model-export/python-export-cli/index.html
Also, just generally, if you have a new questions, it's always best to start a new thread as an already solved thread may be missed. -
Hi, I can fix it now. I can load the trained word embedding model as pickle file.