Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

How to setup partitions to run prepare recipe on Spark.

gt
Level 1
How to setup partitions to run prepare recipe on Spark.

Hello,

I have data stored in S3 buckets with each day data in a date folder (%Y-%M-%D). However, I want to partition my data hourly based on filename. Each file is named as '%Y%M%D_%H...{guid}'. I an able to create hourly partitions  using pattern "/.*/%Y%M%D_%H.*". This creates partitions but throws error when running a recipe on Spark.

I get below error - 

 Invalid partitioning

Can't resolve the path of this partition to a valid folder: /.*/20220817_10.*.

Thanks in advance for any help!

0 Kudos
3 Replies
ZachM
Dataiker
Dataiker

Hi @gt ,

Could you please doublecheck that the partitioning pattern is correct on both the input and output dataset of the recipe?

Based on the error message, I think you have an extra period at the end of the pattern for the output dataset, e.g. /.*/%Y%M%D_%H.*. instead of /.*/%Y%M%D_%H.*

If that doesn't resolve the issue, please post screenshots of the partitioning settings for both the input and output dataset. Also, does the error occur for every partition that's being built, or just for the 20220817_10 partition?

 

Thanks,

Zach

0 Kudos
gt
Level 1
Author

Hi @ZachM ,

Thank you for your response.
There is no extra period at the end of pattern for the output dataset. The same works when running using DSS local stream but fails for spark only.  The error occurs for all partitions.

Below are the screenshots of the partition settings for both input and output datasets.

0 Kudos
ZachM
Dataiker
Dataiker

Hi @gt ,

Thank you for providing the screenshots.

Your partition settings look good to me, so I'm not sure what would be causing this error.

Could you please open a support ticket so that we can further assist with this issue? You can open a ticket by going to https://support.dataiku.com/support/tickets/new

In the description, please include a link to this community post, as well as a job diagnostic of the failing job.

To create a job diagnostic, from the job page, click on Actions > Download job diagnosis.
If the resulting file is too large to attach (> 15 MB), you can use https://dl.dataiku.com to send it to us. Please don't forget to send the link that is generated when you upload the file.

Thanks

0 Kudos