Discussions - Dataiku Community

Latest Activity

Using custom python model for Clustering (Agglomerative Clustering)
Hi all, I have a question regarding custom python models for a clustering modelling task. I am trying to do something really basic, like running Agglomerative Clustering using a different metric and linkage methods (included in sklearn natively). at the moment, I seem to be unable to do so default model in dataiku, so I…
Feature handling Dummy encoding
Dataiku's category handling = Dummy encoding with dropping dummy option seems to be using a level with the least exposure/volume as a dummy. Q1. Is there a way to set this dummy manually instead of Dataiku's default method? Want to avoid using category handling = custom preprocessing option. Q2. Using Variable type =…
set the random state in visual ML models
I have an ongoing project in production that I intend to replace with another project currently in development. As part of this transition, I find myself comparing a dataset that has undergone scoring from a model in each project. Initially, I anticipated the model scores to be identical or, at the very least, very…
Trouble Training new Models in an existing Project
Hey there, so I am having trouble training new models on an existing project, if I either update an existing recipe or deploy the newly trained model in a new visual tool in the flow whenever I try to score a dataset, I am getting the following error: Error in python process: <class…
How to Create a Batch Inference API for a Model?
Hello Dataiku Community, I'm looking for guidance on how to set up a batch inference API for a machine learning model. Specifically, I want to create an API endpoint that can take a batch of data and return predictions from my model. Here are a few details about my setup: - I have a trained model. - I want to provide it…
Hyperparameter Optimization
Is this training error, rather than validation (or test) set error? Because the graphs exhibit very little evidence of overfitting, even when model complexity is maximized according to either hyperparameter.
Hyperparameter Tuning // Grid Search Always Picking Biggest Numbers
I am using Random Forest algorithm. Also, I am trying tune hyperparameters by using Grid Search. Grid Search picking biggest number hyperparameters all the time. I am sharing a pictures about my problem. Is there any reason of this problem?
Better visibility into sampling and filtering settings
DSS rightly utilizes sampling and filtering for efficiency in multiple places. Sampling and Filtering can be used in multiple places including Charts, Statistics, Column Analysis and more. However, it is not immediately clear what the sampling settings are when looking at a chart or a statistical card without explicitly…
manage folder
Hi All, If I make a managed folder, ID is created as a random number Is this ID value fixed once it is created? Or does it automatically changes? Operating system used: centos
custom callbacks
im trying to use custom callback callbacks = [ EarlyStopping(monitor='val_accuracy', min_delta=1e-3, patience=5, mode='max', restore_best_weights=True, verbose=1),] its running ok! but o dashboard not showing epochs and the running state only showing -> Optimization results will appear as soon as they are available.…

Top