creating dataset from S3 and SQL
When we are creating the dataset from S3 or SQL table, is dataset gets created on local filesystem or does it remain at S3 or SQL table and streaming of data takes place to Dataiku?
Best Answer
-
Hi,
It is important to understand that a dataset in DSS is essentially just a "pointer" to the underlying table. So if your underlying data or table is in S3 or a SQL database, then the data will remain in said datastore and DSS will not "copy" the data locally. If DSS then uses this dataset (as an input or output of a recipe for example), then DSS will connect to it and read or write to it accordingly.
The only way that a dataset is created locally typically is if you do something like create a managed filesystem or Upload files dataset. I hope that this helps but you may also find our DSS concepts documentation helpful as well.
Thanks,
Andrew