Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
I put this code in my python recipe :
loop_issues1 = dataiku.Dataset("loop_issues1")
loop_issues1_df = loop_issues1.get_dataframe()
loop_issues1_df.drop_duplicates(subset='limit',keep=False)
loop_issues1.write_dataframe(loop_issues1_df)
The ยซ drop_duplicates ยป doesnโt work when I check the option ยซ append instead of overwrite ยป.
I donโt understand why.
I need this option for the rest of my code.
But apparently, it prevents the code from dropping anything !!!
Thank you for your help
Hello Sebastien,
as many other pandas function, "drop_duplicates" will by default create a copy of your dataframe, drop the duplicates on it and return it. So you need to:
* reassign the return value to your dataframe
* or use the "inplace" argument of the function to apply the change to the input dataframe
With your code, it would look like:
loop_issues1 = dataiku.Dataset("loop_issues1")
loop_issues1_df = loop_issues1.get_dataframe()
loop_issues1_df = loop_issues1_df.drop_duplicates(subset='limit',keep=False)
loop_issues1.write_dataframe(loop_issues1_df)
or
loop_issues1 = dataiku.Dataset("loop_issues1")
loop_issues1_df = loop_issues1.get_dataframe()
loop_issues1_df.drop_duplicates(subset='limit',keep=False, inplace=True)
loop_issues1.write_dataframe(loop_issues1_df)
You can have a look at the function documentation for more information.
Hope this helps
Best
Hello Nicolas,
Thank you for your answer.
Yes, I already tried one of your solutions. But it doesn't work. Both.
I'm looking for a function which would replace the "append instead of overwrite" option.
Thank you for your help
Hello Sebastien,
Can you describe more precisely your use case ?
The append instead of overwrite, in the context of a python recipe, just means that when you use write_dataframe(), it will append the content of the dataframe to the dataset instead of replacing the current content of the dataset with the one of the dataframe.
Best regards