Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

Can I limit the size of my data volume input

Solved!
Robel
Level 1
Can I limit the size of my data volume input

Hello,

I have attached my EMR with Dataiku for data processing. I have big data stored in S3. I want to test the performance of the EMR cluster as my data size increases. Is there a way to break down my dataset from the Dataiku side so that I can test my EMR with different data sizes?

 

Thank you,

Robel.

1 Solution
Clément_Stenac
Dataiker
Dataiker

Hi,

I think you're looking for the sampling recipe. You could create 3 sampling recipes from your original dataset extracting (for example) 1%, 5% and 30% of your input dataset (and outputing them to S3 which is the prefered source for EMR), and use any of these sampled datasets for your EMR tests.

View solution in original post

1 Reply
Clément_Stenac
Dataiker
Dataiker

Hi,

I think you're looking for the sampling recipe. You could create 3 sampling recipes from your original dataset extracting (for example) 1%, 5% and 30% of your input dataset (and outputing them to S3 which is the prefered source for EMR), and use any of these sampled datasets for your EMR tests.

View solution in original post

A banner prompting to get Dataiku DSS