-
ELT best practices? (workspace, intermediate data sets, views...)
We are preparing a SAS to Databricks migration and we are considering Dataiku as a low-code ETL for non-technical users. Dataiku feels very close to an awesome experience but there are a few issues that make me worry about the sustainability of such an approach. Do you have recommended best practices to mitigate those…
-
Elasticsearch index with custom settings?
Hi community! I am wondering how I can create an Elasticsearch index with custom settings for the analyzer, filter and tokenizer. The documentation (doc.dataiku.com/dss/latest/connecting/elasticsearch.html) mentions "you can use an index template before building the managed dataset for the first time", however, it does not…
-
SqlExecutor2 does not handle Snowflake ARRAY data type * Bug?
Hello, I am using the SQLExecutor2 to read a temporary table and write to a Snowflake dataset in a Python recipe. Here is the column data type: {"type":"ARRAY","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false} Here is my Python code: Expression type does not match column data type, expecting…
-
Using SQLExecutor2 inside shared library
Hi, I would like to execute some raw sql queries like insert the rows directly into the oracle database. Based on the various community discussions, I chose to use SQLExecutor2. My code is as below: from dataiku import SQLExecutor2 import dataiku def test(): # get the needed data to prepare the query # for example, load…
-
Dataiku's Python environment is using a version of the GNU C Library (glibc) that is too old.
I can't import packages pyg-lib and torch-sparse for pytorch-geometric, because Dataiku's Python environment is using a version of the GNU C Library (glibc) that is too old. Both pyg-lib and torch-sparse depend on glibc version 2.29 or newer, but current system has an older version. i got error while importing:…
-
Visual ML - model with multiple features
Hello, is it possible for the AutoML Prediction Visual ML recipe to have multiple targets? Currently, I can only create a prediction model on a single feature/target but I plan on prediction for two features as I need both predicted values for the custom metric scoring that I will be using for the said model. Thank you.…
-
How to retrieve the test dataset used in the trained model With python?
Hello everyone, I am working on Dataiku, primarily using their API. I have trained my model and would like to retrieve the dataset that was used for testing via the API methods. Despite trying several methods, including get_train_info(), I am unable to obtain the test dataset. I don't want to export it; I just want to…
-
Fixed copy of Python-based scenario that did not copy the script
How can we effectively replicate this issue in order to conduct thorough testing during the version upgrade process? Specifically, what steps should we follow to ensure that the issue is accurately reproduced, and what testing methodologies can we apply to assess its impact in the new version? Operating system used:…
-
<class 'json.decoder.JSONDecodeError'> when evaluating a deployed Random Forest model
How to replicate: Using windows10, download the latest Dataiku DSS on-premise version (13.2.3). Create a New project, upload any dataset with a "target" column having binary value. Click the dataset - Lab - AutoML Prediction - Quick Prototype - Train a Random Forest model on "target", using default settings. Deploy the…
-
How to resolve labelling dashboard displaying data already labelled?
For a new project I have setup a Webapp for labelling of tabular data. I added the Webapp as a tile in a dashboard. For me both the Webapp and dashboard function properly. Yet, when I share the dashboard with a team member it displays the dataset is already labelled and does not allow for further labelling of the data. Do…