-
Use first row as column headers/column names
I have an excel file whose format is somewhat different. I need to skip first 9 lines and use 10th line as column names. I have tried using "parse next line as column headers" but it is not working for me. Has anyone faced this type of error. Please let me know how to resolve it. I am pasting input file format below:
-
where to find confusion matrix, ROC, Accuracy and other in standalone evaluation recipe results
Operating system used: windows
-
How to calculate percentiles of a column suppose amount starting from P1 to P100 by Segment
How to calculate percentiles of tran_amt starting from P1 to P100 by Segment Operating system used: Windows
-
Is learning Dataiku a competitive advantage for aspiring data analysts/data scientist ?
I’m Mugesh — an aspiring Data Analyst currently exploring Dataiku. I recently discovered this tool and I must say, its infrastructure and user flow are quite fascinating. As someone working hard to break into the industry, I’m curious to know: Will gaining expertise in Dataiku give me a competitive edge in landing a data…
-
How to set project variables inside process running on Thread?
Hi there, I am trying to set some project variable using set_variable() method of Dataiku API. But it somehow only runs once. Below is my DASH webapp code snippet for reference - import threading import dataiku import time def start_execution_publish_to_pbi(config_id): try: project = dataiku.Project() variables =…
-
Image Rebuild Error
Hi, Our instance are running with DSS version as 13.4.0. I tried rebuilding the image to remove lower versions of python like 2.7,3.6,3.7,3.8 using the below command. Unfortunately its failing at below stage. ./bin/dssadmin build-base-image --type container-exec --without-py27 --without-py37 --without-py38 --with-py39…
-
How does the evaluation store threshold actually work?
In the documentation for the evaluation store, when doing a two-class (binary) classification, there is a slider for the threshold used. The documentation for this threshold reads in part: When doing binary classification, most models don’t output a single binary answer, but instead a continuous “score of being positive”.…
-
How to Automate Clustering with Anomaly Detection for Each Partition in Dataiku?
Hello Dataiku Community, I’m working on a project where I’ve partitioned my dataset by category and year. For example, my partitions look like this: Category A | 2021 Category A | 2022 Category A | 2023 Category A | 2024 Category B | 2021 Category B | 2022 Category B | 2023 Category B | 2024 Category C | 2021 Category C |…
-
Use Case : sync data
example table a most updated data. data b has not been updated. but examples there are data inconsistencies. suppose table a has 25 data. now table b only has 15 data: 1. table A 1 - 10 suppose it has the same id as in table b. but table b data has not been updated even though the id is the same. 2. Table A 11 - 25 the…
-
Can we use a Kubernetes cluster with the free edition of Dataiku?
Hi, Can we use a Kubernetes cluster with the free edition of Dataiku? Let's say we have a Linux VM in AWS or Azure environment where we have deployed the free edition of DSS Ver 12 or later. Is it possible to use a Kubernetes cluster in this environment to reduce model training time? Thank you. Taka Operating system used:…