Dysfunction of the python recipe

sebastienH Registered Posts: 7 ✭✭✭✭


I put this code in my python recipe :

loop_issues1 = dataiku.Dataset("loop_issues1")

loop_issues1_df = loop_issues1.get_dataframe()



The « drop_duplicates » doesn’t work when I check the option « append instead of overwrite ».

I don’t understand why.

I need this option for the rest of my code.

But apparently, it prevents the code from dropping anything !!!

Thank you for your help


  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    Hello Sebastien,

    as many other pandas function, "drop_duplicates" will by default create a copy of your dataframe, drop the duplicates on it and return it. So you need to:

    * reassign the return value to your dataframe

    * or use the "inplace" argument of the function to apply the change to the input dataframe

    With your code, it would look like:

    loop_issues1 = dataiku.Dataset("loop_issues1")loop_issues1_df = loop_issues1.get_dataframe()loop_issues1_df = loop_issues1_df.drop_duplicates(subset='limit',keep=False)loop_issues1.write_dataframe(loop_issues1_df)


    loop_issues1 = dataiku.Dataset("loop_issues1")loop_issues1_df = loop_issues1.get_dataframe()loop_issues1_df.drop_duplicates(subset='limit',keep=False, inplace=True)loop_issues1.write_dataframe(loop_issues1_df)

    You can have a look at the function documentation for more information.

    Hope this helps


  • sebastienH
    sebastienH Registered Posts: 7 ✭✭✭✭

    Hello Nicolas,

    Thank you for your answer.

    Yes, I already tried one of your solutions. But it doesn't work. Both.

    I'm looking for a function which would replace the "append instead of overwrite" option.

    Thank you for your help

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    Hello Sebastien,

    Can you describe more precisely your use case ?

    The append instead of overwrite, in the context of a python recipe, just means that when you use write_dataframe(), it will append the content of the dataframe to the dataset instead of replacing the current content of the dataset with the one of the dataframe.

    Best regards

Setup Info
      Help me…