Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I use a python recipe to do a drop_duplicates on columns and I see that the output variables are altered.
It seems to be specific to pandas which transforms the integer with NA into a float.
So it comes that the get_dataframe function would have an argument to add in the function to overcome this and lock my variable:
And in the case of a missing value it crashes because NA does not respect the format.
I think I have found an alternative
Embed my python code in a PySpark recipe so as not to have the transformation due by pandas
But I dont know how its work : maybe it's pandas is not using RAM for operation and dont need to convert anymore ?