Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi There,
I am trying to make use of the Textblob package within a Dataiku recipe.
More specifically I'm trying to create a python recipe which translates a column "Description" from Russian to English using this package.
I'm basing myself on the script which I found here in the context of a Kaggle competition:
https://www.kaggle.com/gunnvant/russian-to-english-translate-with-progress-bar
I wanted to have a try to to see how I could incorporate this into a Dataiku Recipe (I took out the references to the progres bar part, which I don't need here).
--------------------------------------------------------------
My input is "translate_2" which consists out of two columns
-"ID": Integers
-"Description": Russian words with a few missings
My output is "output"
----------------------------------------------------------------------
I have reworked the code into the result below to integrate it into Dataiku:
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import sys
import textblob
# Read recipe inputs
train_Raw_filtered = dataiku.Dataset("translate_2")
x = train_Raw_filtered.get_dataframe()
#Takes data frame as input, then searches and fills missing description with ะฝะตะดะพััะฐััะธะน (russian for "missing")
def desc_missing(x):
if x['Description'].isnull().sum()>0:
x['Description'].fillna("ะฝะตะดะพััะฐััะธะน",inplace=True)
return x
else:
return x
x=desc_missing(x)
#Translate
def translate(x):
try:
return textblob.TextBlob(x).translate(to="en")
except:
return x
x=translate(x)
#Map to new column
def map_translate(x):
x['en_desc']=x['Description']
return x
x=map_translate(x)
# Write recipe outputs to dataiku
train_Raw_Translated = dataiku.Dataset("output")
train_Raw_Translated.write_with_schema(x)
The code runs without error. It does impute the "missing" value, but I do not seem to succeed to write the actual translation
into the Dataiku recipe output. It just inherits the original values:
When I take a look at the logs I find this line which I don't know how to interpret at this point:
Bottom line:
Any help would be appreciated.
Thanks a million.
Kind Regards,
Tim