New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

Dysfunction of the python recipe

sebastienH
Level 2
Dysfunction of the python recipe

Hello,

 

I put this code in my python recipe :

 

loop_issues1 = dataiku.Dataset("loop_issues1")

loop_issues1_df = loop_issues1.get_dataframe()

loop_issues1_df.drop_duplicates(subset='limit',keep=False)

loop_issues1.write_dataframe(loop_issues1_df)

 

The « drop_duplicates » doesn’t work when I check the option « append instead of overwrite ».

I don’t understand why.

 

I need this option for the rest of my code.

 

But apparently, it prevents the code from dropping anything !!!

 

 

Thank you for your help

0 Kudos
3 Replies
Nicolas_Servel
Dataiker
Dataiker

Hello Sebastien,

as many other pandas function, "drop_duplicates" will by default create a copy of your dataframe, drop the duplicates on it and return it. So you need to:

* reassign the return value to your dataframe

* or use the "inplace" argument of the function to apply the change to the input dataframe

 

With your code, it would look like:

 

loop_issues1 = dataiku.Dataset("loop_issues1")
loop_issues1_df = loop_issues1.get_dataframe()
loop_issues1_df = loop_issues1_df.drop_duplicates(subset='limit',keep=False)
loop_issues1.write_dataframe(loop_issues1_df)

or

loop_issues1 = dataiku.Dataset("loop_issues1")
loop_issues1_df = loop_issues1.get_dataframe()
loop_issues1_df.drop_duplicates(subset='limit',keep=False, inplace=True)
loop_issues1.write_dataframe(loop_issues1_df)

 

You can have a look at the function documentation for more information.

Hope this helps

Best

 

 

0 Kudos
sebastienH
Level 2
Author

Hello Nicolas,

Thank you for your answer.

Yes, I already tried one of your solutions. But it doesn't work. Both.

I'm looking for a function which would replace the "append instead of overwrite" option.

Thank you for your help

 

 

0 Kudos

Hello Sebastien,

Can you describe more precisely your use case ?

The append instead of overwrite, in the context of a python recipe, just means that when you use write_dataframe(), it will append the content of the dataframe to the dataset instead of replacing the current content of the dataset with the one of the dataframe.

 

Best regards

0 Kudos
A banner prompting to get Dataiku DSS