I am using Dataiku to create partitions on an HDFS dataset, as the result of a Spark recipe. I noticed if the dataframe in the preceding recipe contains the column with which you are trying to partition, Spark throws an illegal argument exception (here the column in the dataframe, and the one I am partitioning is called 'res':
com.dataiku.common.server.APIError$SerializedErrorException: At line 58: <class 'pyspark.sql.utils.IllegalArgumentException'>: requirement failed: Schema of output dataset GUNIT_SPARK.test does not contain dataframe column res
I think this is the result of the Dataiku Spark bindings, which is doing something strange with the schema, and spark's write.parquet() method. If I drop the partition column in the recipe itself, I don't get this error.
What is really going on here? Is there any way you could catch an exception in Dataiku before Spark crashes with this error?