Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I use a python recipe to do a drop_duplicates on columns and I see that the output variables are altered.
It seems to be specific to pandas which transforms the integer with NA into a float.
OK
So it comes that the get_dataframe function would have an argument to add in the function to overcome this and lock my variable:
infer_with_pandas=False
And in the case of a missing value it crashes because NA does not respect the format.
log :
I think I have found an alternative
Embed my python code in a PySpark recipe so as not to have the transformation due by pandas
But I dont know how its work : maybe it's pandas is not using RAM for operation and dont need to convert anymore ?