Write pandas dataframe in dataset

Data_ing_solv
Level 1
Write pandas dataframe in dataset

I work on Dataiku and I have a jupyter notebook which is work and now I want to include this on python recipe.

`data_f` is the name of my dataframe and `output_gen_python` is the name of my dataset in dataiku.

I have this error :

> Job failed: Error in Python process: At line 158: <class 'NameError'>: name 'data_df' is not defined

Here is my code :

 

 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import datetime, timedelta

# Read recipe inputs
batches_types_copy = dataiku.Dataset("batches_types_copy")
batches_types_copy_df = batches_types_copy.get_dataframe()
Last_hour_extract = dataiku.Dataset("Last_hour_extract")
last_hour_extract_df = Last_hour_extract.get_dataframe()


class OutputMode(object):
...

class IDCalculation_I:
def _preGenerateID(self,outputMode,data_df):
...

def generateID(self,outputMode,data_df):
pass

class IDCase1(IDCalculation_I):
def generateID(self,outputMode,data_df):
...
return data_df

class IDCase2(IDCalculation_I):
def generateID(self,outputMode,data_df):
...
return data_df

class Fingerprinter(object):
def __init__(self,outputMode):
self._outputMode = outputMode

def _generateID(self,data_df):
return self._outputMode.getCaseID().generateID(self._outputMode,data_df)

def run(self,data_df):
# GenerateID
data_df = self._generateID(data_df)
return data_df

def __str__(self):
return str(self._outputMode)


outputMode = OutputMode('EEA','06:00:00','08:00:00',pytz.timezone('Europe/Paris'),CONST_MODE_CONT,IDCase1())
fp_calculator = Fingerprinter(outputMode)

output_gen_python_df = data_df # Compute a Pandas dataframe to write into output_gen_python

# Write recipe outputs
output_gen_python = dataiku.Dataset("output_gen_python")
output_gen_python.write_with_schema(output_gen_python_df)

 

 

 

0 Kudos
9 Replies
Data_ing_solv
Level 1
Author

Can we use object programming with python classes or just methods in a python recipe ?

0 Kudos
louisplt
Dataiker

Hello @Data_ing_solv,

You can definitely use object programming in a Python Recipe. From the error and your code, I guess the problem you encountered is at this line:

output_gen_python_df = data_df # Compute a Pandas dataframe to write into 

 "data_df" is not defined here. Maybe you wanted to use one of your input dataframes: "batches_types_copy_df" or "last_hour_extract_df".

Hope this helps

0 Kudos
Data_ing_solv
Level 1
Author

Thank you for your answer, in fact I merge my two dataframes "batches_types_copy_df" and  "last_hour_extract_df" for create a new one named "data_df". I can't do it ?

0 Kudos
louisplt
Dataiker

Hi,

You can do that of course, but I didn't see any code that does this merge. You can have a look to the Pandas documentation on how to merge or concatenate two dataframes: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Hope this helps

0 Kudos
Data_ing_solv
Level 1
Author

I do this in my _preGenerateID function : 

data_df = batches_types_df.merge(right=last_hour_extract_df, how='left', on='equipement')

 

But it does not recognize my dataframe in dataiku unlike jupyter notebook.

0 Kudos
louisplt
Dataiker

Without the whole code it's hard for me to help you. From what I understand, you should create the ""data_df" variable in the global scope and not in the scope of the method "_preGenerateID".

0 Kudos
Data_ing_solv
Level 1
Author

I do all my treatments in my method, I can try to join my two dataframes and store the result in an input "data_df". It can resolve the problem I think.

0 Kudos
AlexT
Dataiker

Hi,

The error means that your code is referencing data_df before this is defined.

If you could share the full stack trace and perhaps which is actually line 158 in your code it may help understand where/why this is happening. 

 

0 Kudos
Data_ing_solv
Level 1
Author

Here are the logs if it can help you to understand my problem : 

0 Kudos