Dataset to be saved at root of connection from Dataiku to Amazon S3, how to save my datset properly?

mikempc · November 2021

I want to store the results of the data I processed with Dataiku into an Amazon S3 bucket. I already have a connection from which I can read data but when I try to create a dataset from Dataiku in S3 I have a weird warning and no files:

skXHI

So what did I missed in order to save a dataset into my Amazon S3 bucket? Is it because I am in path-restriction-mode as they talk about in the connection-paths-handling-docs ?

Operating system used: Windows

dgraham · November 2021

Hi @mikempc
,

The message: "Selected path does not exist" means the path specified in the S3 connection is invalid (Note: “/Bucket/Path-in-bucket” must exist in S3).

Please confirm that the path in the “Path restrictions” section of the S3 connection is correct. Please also verify that there are no leading or trailing whitespaces in the "Bucket" and "Path in bucket" fields of the S3 connection.

mikempc · November 2021

Oh, okay, sure, now it makes sense @dgraham
. But I have two questions:

- I did put an actual path now and now I am giving the schema. However, couldn't I use the schema of an existing dataset in my flow?
- I created a test dataset with a "test" column but it seems to be read-only. How can I change that so I can write into it?

WAN1D

If I can't create a just created dataset from my last recipe in Dataiku I was thinking about something else: rather than creating a new dataset downloading the file locally and pushing it manually to my Amazon S3 bucket and then updating it.

Thanks !

dgraham · November 2021

Hi @mikempc
,

When you initially create a recipe, the schema of the input dataset is copied to the output dataset and each time you save the recipe, the output schema is automatically computed. If the computed schema does not match the current output dataset schema, then a warning popup will appear, as described here.

It's not possible to copy the schema of an arbitrary dataset from the Flow to another dataset. However, it's possible to propagate the schema of an upstream dataset to all downstream datasets by using the schema propagation tool.

In general, transformations (e.g. aggregation, a join, etc.) on datasets in Dataiku DSS are performed via recipes. But, in advanced cases, the "Push to editable" recipe could be used where an editable dataset is needed. This recipe copies a regular dataset (input) to an editable dataset (output) while keeping changes in the output dataset.

Dataset to be saved at root of connection from Dataiku to Amazon S3, how to save my datset properly?

Answers

Categories

Setup Info

Tags