Writing Pyspark dataframe to partitioned dataset

Registered Posts: 11 ✭✭✭
edited July 2024 in Using Dataiku

I am trying to write the output of a pyspark recipe to a partitioned dataset, but I am receiving an error.

Py4JJavaError: An error occurred while calling o261.savePyDataFrame.
: com.dataiku.common.server.APIError$APIErrorException: Illegal time partitioning value : 'null' with a MONTH partitioning

This is how I am trying to write it

# Write recipe outputs
CCW_Output_Windows = dataiku.Dataset("CCW_Output_Windows")
dkuspark.write_with_schema(CCW_Output_Windows, CCW_Output_df)

I haven't been able to find any documentation on this. Thanks

Answers

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker

    Hi @Gipple
    ,

    It appears that the recipe is finding a null where it expects a month date value. Please check your data as well as how you are selecting the partitions to build from:

    Screenshot 2024-04-26 at 11.59.17 AM.png

    Kind Regards,

    Jordan

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.