Dysfunction of the python recipe

sebastienH
Level 2
Dysfunction of the python recipe

Hello,

 

I put this code in my python recipe :

 

loop_issues1 = dataiku.Dataset("loop_issues1")

loop_issues1_df = loop_issues1.get_dataframe()

loop_issues1_df.drop_duplicates(subset='limit',keep=False)

loop_issues1.write_dataframe(loop_issues1_df)

 

The « drop_duplicates » doesn’t work when I check the option « append instead of overwrite ».

I don’t understand why.

 

I need this option for the rest of my code.

 

But apparently, it prevents the code from dropping anything !!!

 

 

Thank you for your help

0 Kudos
3 Replies
Nicolas_Servel
Dataiker

Hello Sebastien,

as many other pandas function, "drop_duplicates" will by default create a copy of your dataframe, drop the duplicates on it and return it. So you need to:

* reassign the return value to your dataframe

* or use the "inplace" argument of the function to apply the change to the input dataframe

 

With your code, it would look like:

 

loop_issues1 = dataiku.Dataset("loop_issues1")
loop_issues1_df = loop_issues1.get_dataframe()
loop_issues1_df = loop_issues1_df.drop_duplicates(subset='limit',keep=False)
loop_issues1.write_dataframe(loop_issues1_df)

or

loop_issues1 = dataiku.Dataset("loop_issues1")
loop_issues1_df = loop_issues1.get_dataframe()
loop_issues1_df.drop_duplicates(subset='limit',keep=False, inplace=True)
loop_issues1.write_dataframe(loop_issues1_df)

 

You can have a look at the function documentation for more information.

Hope this helps

Best

 

 

0 Kudos
sebastienH
Level 2
Author

Hello Nicolas,

Thank you for your answer.

Yes, I already tried one of your solutions. But it doesn't work. Both.

I'm looking for a function which would replace the "append instead of overwrite" option.

Thank you for your help

 

 

0 Kudos
Nicolas_Servel
Dataiker

Hello Sebastien,

Can you describe more precisely your use case ?

The append instead of overwrite, in the context of a python recipe, just means that when you use write_dataframe(), it will append the content of the dataframe to the dataset instead of replacing the current content of the dataset with the one of the dataframe.

 

Best regards

0 Kudos