-
Save pandas dataframe to .csv in managed S3 folder
Hi Dataiku-Team, I have a quick question related to managed S3 folders. I have a dataframe which I want to save as a .csv file in a managed S3 folder. Reading the documentation, it sounds to me that I have to store the .csv file in local folder on the DSS server, and then have to upload it like this: handle =…
-
Cannot compute metrics on S3 datasets automatically
I'm currently trying to keep track of metrics on a partitioned S3 dataset. I've used the "Autocompute after build" option, but the metrics are not computed on dataset change. I've also tried to compute the metrics using the api (here I'm interested in the record count) : client = dataiku.api_client() current_project =…
-
Run receipe
Hi Team, I have download a file from AWS S3, Created a recipe ( rename and add new column ) , getting an error while executing the script, com.dataiku.dip.datasets.fs.HTTPDatasetHandler cannot be cast to com.dataiku.dip.datasets.fs.AbstractFSDatasetHandler Logs may contain additional information Additional technical…
-
Export datasets in one files to S3
Hello, Currently, DSS exports the dataset into 4 files `out-s....csv.gz` on S3. Is it possible to send only one single file? Regards,
-
Common Crawl S3
I am currently trying to connect the Common Crawl S3 to Dataiku. I have tried different configurations. However I am not sure what to enter as "Access Key" and "Secret Key". I guess it is not my private AWS credential. Does anyone have experience with that? Thanks, Matthew
-
How to access s3 files from jupyter notebook using spark or define spark's external packages
What I know is to use pyspark --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1 so that I could access s3 file, how do I configure it on dss?