Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Partitioning - partition column duplicated when using SQL query with Athena on S3

tanguy
Level 2
Partitioning - partition column duplicated when using SQL query with Athena on S3

Hi,

I have recently stumbled upon what appears to be a bug with a partitioned table.

I wanted to filter my partitions with a simple SQL statement "SELECT * FROM {my_partioned_table} WHERE {my_filter}". This works fine when using a Spark SQL recipe, but fails with a SQL query (which runs with Athena as we work with AWS). In the latter situation, the partitioned column reappears in the output schema. IMO, this must be wrong because :

  1. it prevents the computation of metrics of the output table (duplicate column error)
  2. it prevents further SQL recipes (as the "validation" step fails because of a duplicate column error).

Note : It's a pity that the duplicate column is not explicitly pointed out (this could save the user some time, as investigating the table schema does not show any duplicate column ...).

0 Kudos
0 Replies
A banner prompting to get Dataiku DSS