Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I work on Dataiku and I have a jupyter notebook which is work and now I want to include this on python recipe.
`data_f` is the name of my dataframe and `output_gen_python` is the name of my dataset in dataiku.
I have this error :
> Job failed: Error in Python process: At line 158: <class 'NameError'>: name 'data_df' is not defined
Here is my code :
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import datetime, timedelta
# Read recipe inputs
batches_types_copy = dataiku.Dataset("batches_types_copy")
batches_types_copy_df = batches_types_copy.get_dataframe()
Last_hour_extract = dataiku.Dataset("Last_hour_extract")
last_hour_extract_df = Last_hour_extract.get_dataframe()
class OutputMode(object):
...
class IDCalculation_I:
def _preGenerateID(self,outputMode,data_df):
...
def generateID(self,outputMode,data_df):
pass
class IDCase1(IDCalculation_I):
def generateID(self,outputMode,data_df):
...
return data_df
class IDCase2(IDCalculation_I):
def generateID(self,outputMode,data_df):
...
return data_df
class Fingerprinter(object):
def __init__(self,outputMode):
self._outputMode = outputMode
def _generateID(self,data_df):
return self._outputMode.getCaseID().generateID(self._outputMode,data_df)
def run(self,data_df):
# GenerateID
data_df = self._generateID(data_df)
return data_df
def __str__(self):
return str(self._outputMode)
outputMode = OutputMode('EEA','06:00:00','08:00:00',pytz.timezone('Europe/Paris'),CONST_MODE_CONT,IDCase1())
fp_calculator = Fingerprinter(outputMode)
output_gen_python_df = data_df # Compute a Pandas dataframe to write into output_gen_python
# Write recipe outputs
output_gen_python = dataiku.Dataset("output_gen_python")
output_gen_python.write_with_schema(output_gen_python_df)
Can we use object programming with python classes or just methods in a python recipe ?
Hello @Data_ing_solv,
You can definitely use object programming in a Python Recipe. From the error and your code, I guess the problem you encountered is at this line:
output_gen_python_df = data_df # Compute a Pandas dataframe to write into
"data_df" is not defined here. Maybe you wanted to use one of your input dataframes: "batches_types_copy_df" or "last_hour_extract_df".
Hope this helps
Thank you for your answer, in fact I merge my two dataframes "batches_types_copy_df" and "last_hour_extract_df" for create a new one named "data_df". I can't do it ?
Hi,
You can do that of course, but I didn't see any code that does this merge. You can have a look to the Pandas documentation on how to merge or concatenate two dataframes: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Hope this helps
I do this in my _preGenerateID function :
data_df = batches_types_df.merge(right=last_hour_extract_df, how='left', on='equipement')
But it does not recognize my dataframe in dataiku unlike jupyter notebook.
Without the whole code it's hard for me to help you. From what I understand, you should create the ""data_df" variable in the global scope and not in the scope of the method "_preGenerateID".
I do all my treatments in my method, I can try to join my two dataframes and store the result in an input "data_df". It can resolve the problem I think.
Hi,
The error means that your code is referencing data_df before this is defined.
If you could share the full stack trace and perhaps which is actually line 158 in your code it may help understand where/why this is happening.