I put this code in my python recipe :

loop_issues1 = dataiku.Dataset("loop_issues1")

loop_issues1_df = loop_issues1.get_dataframe()



The « drop_duplicates » doesn’t work when I check the option « append instead of overwrite ».

I don’t understand why.

I need this option for the rest of my code.

But apparently, it prevents the code from dropping anything !!!

    Hello Sebastien,

    as many other pandas function, "drop_duplicates" will by default create a copy of your dataframe, drop the duplicates on it and return it. So you need to:

    * reassign the return value to your dataframe

    * or use the "inplace" argument of the function to apply the change to the input dataframe

    With your code, it would look like:

    loop_issues1 = dataiku.Dataset("loop_issues1")loop_issues1_df = loop_issues1.get_dataframe()loop_issues1_df = loop_issues1_df.drop_duplicates(subset='limit',keep=False)loop_issues1.write_dataframe(loop_issues1_df)


    loop_issues1 = dataiku.Dataset("loop_issues1")loop_issues1_df = loop_issues1.get_dataframe()loop_issues1_df.drop_duplicates(subset='limit',keep=False, inplace=True)loop_issues1.write_dataframe(loop_issues1_df)

    You can have a look at the function documentation for more information.

    Hello Nicolas,

    Thank you for your answer.

    Yes, I already tried one of your solutions. But it doesn't work. Both.

    I'm looking for a function which would replace the "append instead of overwrite" option.

    Hello Sebastien,

    Can you describe more precisely your use case ?

    The append instead of overwrite, in the context of a python recipe, just means that when you use write_dataframe(), it will append the content of the dataframe to the dataset instead of replacing the current content of the dataset with the one of the dataframe.

