New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

Pig scripts directly with AWS S3 inputs

Solved!
Romain_NIO
Level 2
Pig scripts directly with AWS S3 inputs
Hi,

I recently installed Dataiku DSS Community Edition to test DSS with on an Hadoop Cluster (AWS EMR). All datasets will be stored on S3 buckets.

When I import my dataset into DSS, "pig" is disable : "pig not available : dataset is not on HDFS"

Is-it planned in future releases to allow DSS user to execute pig scripts with S3 inputs ? (with pig native S3 loaders)

At the moment, my workaround is to use "sync" module to copy datasets from S3 to HDFS.

Thanks 🙂
Romain.
0 Kudos
1 Solution
FlorianD
Dataiker
Dataiker
Hi Romain,

Sure it makes sense. In your case, what is the format of the files stored in S3 ?

View solution in original post

3 Replies
FlorianD
Dataiker
Dataiker
Hi Romain,

Sure it makes sense. In your case, what is the format of the files stored in S3 ?

View solution in original post

Romain_NIO
Level 2
Author
Hi Florian,

Thank you for your answer.

Our datasets will be in different formats (according to the project), I assume it will be JSON, CSV or AVRO.

One question about DSS and data cleaning, is there a "default" DSS format ? I mean when I sync a Mysql database to HDFS or a CSV to HDFS (or when applying a "prepare" recipe), is there a default format applied like avro or something ?
0 Kudos
UserBird
Dataiker
Dataiker
Hi Romain,

When you create a new HDFS dataset, by default it is csv. However, you can change the setting in the dataset format section.
0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS