Common Crawl S3
mattmagic
Registered Posts: 12 ✭✭✭✭
I am currently trying to connect the Common Crawl S3 to Dataiku.
I have tried different configurations. However I am not sure what to enter as "Access Key" and "Secret Key". I guess it is not my private AWS credential.
Does anyone have experience with that?
Thanks,
Matthew
I have tried different configurations. However I am not sure what to enter as "Access Key" and "Secret Key". I guess it is not my private AWS credential.
Does anyone have experience with that?
Thanks,
Matthew
Tagged:
Best Answer
-
Hi,
thanks for your patience. Somehow, I can't manage to connect the commoncrawl bucket.
My most recent error is the following:So I am really unsure, whether you can access the bucket from dataiku or not.
Answers
-
Hi,
Credentials-less access to S3 is not supported. However, since the "commoncrawl" bucket is public, using your private AWS credentials will work -
Hi,
this is my current setup:
However, when adding a S3 dataset I get the following error:
"Could not list buckets: The request signature we calculated does not match the signature you provided. Check your key and signing method. (Service: Amazon S3; Status Code: 403; Error Code: SignatureDoesNotMatch; Request ID: BFBCCF653E7B199D)"
-
A google search suggests an issue with your credentials, https://stackoverflow.com/questions/2777078/amazon-mws-request-signature-calculated-does-not-match-the-signature-provided
-
Even though the bucket is public, if your AWS key does not have your full permissions (ie if it's a restricted IAM user), you need to grant explicit access to the commoncrawl bucket: attach the following policy to your IAM user:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1503647467000",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::commoncrawl/*",
"arn:aws:s3:::commoncrawl"
]
}
]
} -
That works, thanks a lot!