Job failed: Error in python process: At line 16: Python process is running remotely

Options
Ashlin
Ashlin Registered Posts: 2 ✭✭✭
edited July 16 in General Discussion

Dear Experts,

currently using Dataiku online. I am trying to read the cleaned dataset , train it and get the model stored in the folder

I am new in using Dataiku api, would like to take help here .

Then use it in dataiku to test and create api.

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import gensim
import nltk
from gensim.models import Word2Vec
from nltk import word_tokenize



# Read recipe inputs
CleansedData = dataiku.Folder("YEM8QpBl")
#CleansedData_info = CleansedData.get_info()
Source_Path = CleansedData.get_path()
path_Of_CSV = os.path.join(folder_path, "CleanDataForSentence2Vec.csv")  
df = pd.read_csv(path_of_csv)#Problem here
# get array of titles
titles = df['title'].values.tolist()

# tokenize the each title
tok_titles = [word_tokenize(title) for title in titles]

# refer to here for all parameters:
# https://radimrehurek.com/gensim/models/word2vec.html
model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4,
                 iter=100)


#model.save('./data/job_titles.model')


# Write recipe outputs #Problem here too
Model = dataiku.Folder("rlACbXYw");
path = Model.get_path();
model.save(path/'job_titles.model')
        
        
Model_info = Model.get_info()

Please find the code above!

Do suggest the needful!

Br

Ash

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17 Answer ✓
    Options

    Hi Ash,

    Based on the context on another channel what is happening here is your managed folders are not local.

    This means you will need to use managed folder read/write APIs instead. e.g get_download_stream and upload_stream. Please see some suggested changes in the code below.

    https://knowledge.dataiku.com/latest/courses/folders/managed-folders.html

    Let me know if that works for you!

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    import gensim
    import nltk
    from gensim.models import Word2Vec
    from nltk import word_tokenize
    from io import BytesIO
    
    
    # Read recipe inputs
    CleansedData = dataiku.Folder("YEM8QpBl")
    
    #CleansedData_info = CleansedData.get_info()
    Source_Path = CleansedData.get_path()
    
    # you can also use  list_paths_in_partition()
    
    # change to something like 
    with CleansedData.get_download_stream("/CleanDataForSentence2Vec.csv) as stream:
         df = pd.read_csv(stream)
    
    # get array of titles
    titles = df['title'].values.tolist()
    
    # tokenize the each title
    tok_titles = [word_tokenize(title) for title in titles]
    
    # refer to here for all parameters:
    # https://radimrehurek.com/gensim/models/word2vec.html
    model = Word2Vec(tok_titles, sg=1, size=100, window=5, min_count=5, workers=4,
                     iter=100)
    
    
    #model.save('./data/job_titles.model')
    
    
    # Write recipe outputs #Problem here too
    #change this part to something like 
    Model = dataiku.Folder("rlACbXYw")
    path = Model.get_path()
    
    bytes_container = BytesIO()
    model.save(bytes_container)
    bytes_container.seek(0)
    
    Model.upload_stream("saved_model.model", bytes_container)        
            
    Model_info = Model.get_info()

Answers

Setup Info
    Tags
      Help me…