-
How to output to / update my snowflake table using Dataiku
I have a snowflake table and I've set up the connection and everything looks good, Dataiku requires me to create a dataset using that snowflake table that I can use as my input / output. The issue is I have that dataset as my output and when I run my flow, I can see my results, but it isn't actually outputting to my…
-
Why does my DELETE based on a column date do not work?
I want to execute a DELETE statement at my BigQuery table based on the date. I am programming in Python and using the BigQuery library. from google.cloud import bigquery I have two tables in Bigquery (tableA and tableB). Both have a column date called "fecha"; Using this column and the date '2023-09-30,' I want to update…
-
Use of Weights&Biases (wandb)
I am using Dataiku as part of a collaboration with an Airline. However, on my own computer, I was doing parametric studies with Weights&Biases, which is quite comprehensive. I am wondering if I can use Weights&Biases from within Dataiku, which should be possible since wands operates in the cloud. Thanks! Gordon Operating…
-
How to write to a Azure Blob Service from Dataiku API endpoint code
Hello there, I am trying to save contents of a json object from the API endpoint code. Currently my code looks like this. The code is written in a python function as part of a API endpoint which receives data in form of a json object. client = dataikuapi.DSSClient(host, apiKey)client._session.verify = Falseproject =…
-
read pdf with tabula on S3
Hi, I am following this tutorial to work with pdf and managed folders : https://knowledge.dataiku.com/latest/code/managed-folders/tutorial-managed-folders.html But reading the pdf with tabula doesn't work, i have this error message UnsupportedOperation: seek My managed folder is in S3, how can I read this file ?
-
Google Cloud
I added google-cloud to my environment so I can use Bigquery. But when I tried to import bigquery module into dataiku notebook, it came up with ModuleNotFoundError. import google.cloud ---------------------------------------------------------------------------ModuleNotFoundError Traceback (most recent call…
-
Dataiku and BigQuery: Limitations
We are using Dataiku exclusively in Google Cloud Platform (GCP from now on) and mostly with Google Cloud Storage buckets (GCS from now on) and Google BigQuery (BQ from now), Google's Datawarehousing solution. The basic functionality works but we have found several limitations in Dataiku which I like to share with the…
-
Decryption of Pgp encrypted s3 file
Does anyone has the experience to decrypt an .pgp encrypted file which is present in AWS s3 location?
-
Automatically reload schema from table
Hi, I created a python recipe that define a SQL query and run it on a BigQuery table through the Python Bigquery API. My recipe looks like this: DataikuException: java.lang.ClassCastException: Cannot cast com.dataiku.dip.datasets.sql.ManagedSQLTableDatasetTestHandler to…
-
Disk-based partitioned dataset performance
Dear all, I am adding hourly partitions to a file-partitioned data set. In the past 2.5 months, the dataset has grown to about 2.8 million records. I will continue to use the dataset in the way I am currently doing so. When I started using this approach, building 100,000 records took eight minutes. The build time has…