Writing Pyspark dataframe to partitioned dataset

Options
Gipple
Gipple Registered Posts: 5

I am trying to write the output of a pyspark recipe to a partitioned dataset, but I am receiving an error.

Py4JJavaError: An error occurred while calling o261.savePyDataFrame.: com.dataiku.common.server.APIError$APIErrorException: Illegal time partitioning value : 'null' with a MONTH partitioning

This is how I am trying to write it

# Write recipe outputsCCW_Output_Windows = dataiku.Dataset("CCW_Output_Windows")dkuspark.write_with_schema(CCW_Output_Windows, CCW_Output_df)

I haven't been able to find any documentation on this. Thanks

Answers

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 293 Dataiker
    Options

    Hi @Gipple
    ,

    It appears that the recipe is finding a null where it expects a month date value. Please check your data as well as how you are selecting the partitions to build from:

    Screenshot 2024-04-26 at 11.59.17 AM.png

    Kind Regards,

    Jordan

Setup Info
    Tags
      Help me…