-
AutoMLクラスタリングにおけるPCA適用後の factor 変数の意味と、元変数との対応関係の確認方法について
お世話になっております。AutoML のクラスタリング機能について質問です。 AutoMLクラスタリングで Dimensionality Reduction にて「Perform dimensionality reduction (PCA)」 を選択して学習を実行しました。 その結果画面において、factor0,factor1, … といった変数が表示されていますが、これらは PCAによって縮約された後の変数(=クラスタリングのトレーニングに実際に使用された特徴量) という理解で正しいでしょうか。 また、PCAで縮約された各 factor と、投入した元の変数との関係(ロード量・寄与度など)を確認したいと考えています。…
-
Censored Regression
It is often the case that modelers encounter censored data, or data that falls >x or <y. In these cases, there are some typical approaches to address the challenge of building a regression model, but currently these are not available in Visual ML. As such, Interval-censored regression, Tobit regression, or censored…
-
Can the hyper-parameter change for each new training model with each new dataset
Dear dataikuler thanks for reading my question. Hi, so my problem is when i re-train my model with different dataset (like my first dataset is from 12/10 /2024 to 12/10/2025 and my second dataset is 30/11/2024 to 30/11/2025) and then i deploy the second model i check the hyperparameter of each version and i see all of them…
-
How to retrieve the test dataset used in the trained model With python?
Hello everyone, I am working on Dataiku, primarily using their API. I have trained my model and would like to retrieve the dataset that was used for testing via the API methods. Despite trying several methods, including get_train_info(), I am unable to obtain the test dataset. I don't want to export it; I just want to…
-
Change Auto-Typing to an off or on option with default “Off”
Would like to have the Auto-Typing setup as an option that can be turned off and on with the default being “Off”. This feature is changing my unit serial numbers (230836735F) to a Float (2.30836735E8) which causes me to lose records when joining on the unit serial numbers field in a following step. This will cause my…
-
Does DSS have a recipe for imbalanced sample? Like SMOTE?
-
<class 'json.decoder.JSONDecodeError'> when evaluating a deployed Random Forest model
How to replicate: Using windows10, download the latest Dataiku DSS on-premise version (13.2.3). Create a New project, upload any dataset with a "target" column having binary value. Click the dataset - Lab - AutoML Prediction - Quick Prototype - Train a Random Forest model on "target", using default settings. Deploy the…
-
Using sample.py after export model python
Hello, I’m trying to use the sample.py after unzipping the archive of a model I extracted. The model is a light gbm with a feature selection step. The version of the dss is 12.6.5 However the python script crash after the dummifier step with the error : Indexed_matrix.py Line 35 in _ remap _ key Remapped_key = (key[0],…
-
Support fot 2way partial dependence plots
I'd love to see support for 2way partial dependence plots in mode summary reports to get insights into the interaction of 2 features on their model impact. This would give some deeper insight into feature behavior in the model at hand. See here under 4.1.1 for the sklearn implementation 4.1. Partial Dependence and…
-
My RAG chat application is not finding the correct embedded chunks when responding to a chat query?