How to work python recipe with function get_dataframe

Options
Grixis6
Grixis6 Registered Posts: 15 ✭✭✭✭✭
edited July 16 in Using Dataiku

I use a python recipe to do a drop_duplicates on columns and I see that the output variables are altered.

It seems to be specific to pandas which transforms the integer with NA into a float.

OK

So it comes that the get_dataframe function would have an argument to add in the function to overcome this and lock my variable:

infer_with_pandas=False

And in the case of a missing value it crashes because NA does not respect the format.

log :

Job failed: Error in Python process: At line 11: <class 'ValueError'>: Integer column has NA values in column 6

Tagged:

Answers

  • Grixis6
    Grixis6 Registered Posts: 15 ✭✭✭✭✭
    Options

    I think I have found an alternative

    Embed my python code in a PySpark recipe so as not to have the transformation due by pandas

    But I dont know how its work : maybe it's pandas is not using RAM for operation and dont need to convert anymore ?

Setup Info
    Tags
      Help me…