Dysfunction of the python recipe
Hello,
I put this code in my python recipe :
loop_issues1 = dataiku.Dataset("loop_issues1")
loop_issues1_df = loop_issues1.get_dataframe()
loop_issues1_df.drop_duplicates(subset='limit',keep=False)
loop_issues1.write_dataframe(loop_issues1_df)
The « drop_duplicates » doesn’t work when I check the option « append instead of overwrite ».
I don’t understand why.
I need this option for the rest of my code.
But apparently, it prevents the code from dropping anything !!!
Thank you for your help
Answers
-
Hello Sebastien,
as many other pandas function, "drop_duplicates" will by default create a copy of your dataframe, drop the duplicates on it and return it. So you need to:
* reassign the return value to your dataframe
* or use the "inplace" argument of the function to apply the change to the input dataframe
With your code, it would look like:
loop_issues1 = dataiku.Dataset("loop_issues1") loop_issues1_df = loop_issues1.get_dataframe() loop_issues1_df = loop_issues1_df.drop_duplicates(subset='limit',keep=False) loop_issues1.write_dataframe(loop_issues1_df)
or
loop_issues1 = dataiku.Dataset("loop_issues1") loop_issues1_df = loop_issues1.get_dataframe() loop_issues1_df.drop_duplicates(subset='limit',keep=False, inplace=True) loop_issues1.write_dataframe(loop_issues1_df)
You can have a look at the function documentation for more information.
Hope this helps
Best
-
Hello Nicolas,
Thank you for your answer.
Yes, I already tried one of your solutions. But it doesn't work. Both.
I'm looking for a function which would replace the "append instead of overwrite" option.
Thank you for your help
-
Hello Sebastien,
Can you describe more precisely your use case ?
The append instead of overwrite, in the context of a python recipe, just means that when you use write_dataframe(), it will append the content of the dataframe to the dataset instead of replacing the current content of the dataset with the one of the dataframe.
Best regards