Plugins & Extending Dataiku
-
External data catalog integration
Hi everyone, I'm looking for a way to integrate DataIku into a standalone Data Catalog tool. For example, DataHub. This stems from the fact that some initial data load and transformation happens inside the DWH through orchestration tool like Airflow and transformation tool like dbt. This creates initial datasets that are…
-
Graph approach for firewall logs & rules analysis
I have been using Dataiku Graph Analytics plugin for a few weeks for firewall rules & logs analysis following the https://diablohorn.com/2022/04/09/firewall-analysis-a-portable-graph-based-approach/ approach. This approach offered quick benefits: - It helped understanding configuration issues using a visual approach, much…
-
Excel Multisheet error "problem with content" message
I wanted to share an issue I came across with Dataiku's Multisheet Excel Plug In and how I solved the problem. I had been using the Excel export with a few datasets and I was excited to try the multisheet plug in. After exporting the folder and attempting to open the file I got an error dialog box. It said there was a…
-
new dataset write error
Im trying to build a piece of code that runs as a recipe. The code builds 2 datasets, one of which is a database connection, and then implements a sql query to build out the other. My issue is that when using the recipe I keep getting the message when I try to write to the dataset that it cant be found. Its obviously there…
-
How to pass on inputRoles & outputRoles in recipes.json?
Hy All - Im trying to write a plugin and it doesn't need input and output datasets defined. Turns out Dataiku wants at least the OutputRole So, I go ahead and leave that one in the json file: //"inputRoles": [// {// "name": "input_dataset",// "arity": "UNARY",// "required": false,// "acceptsDataset": true//…
-
Alternatives to Spark for plain Python
Hello, For our data-intensive recipes, we use PySpark to distribute calculations on a kubernetes cluster. However, there are compute-intensive models (e.g. simulation-based) that we would also like to distribute on multiple machines and my question is whether for them Spark is still the best way to do it in DSS. Our…
-
Embedded dynamic select
Hi, I know that we can dynamicaly influence the SELECT dropdown of a plugin param, but when I'm trying to use it in a OBJECT_LIST param, it doesn't work. My recipe.json looks like that { ... "paramsPythonSetup": "compute_custom_select.py", "params": [ { "name": "param1", "label": "param1", "type": "OBJECT_LIST",…
-
Using the next page url provided in the "API Connect" Plugin
Hello, I tried different configurations with the plugin "API Connect" in order to use pagination and point dynamically to the next page but to no avail. I found this link from Dataiku Community Solved: Solved: Using the next page url provided in the API Plugin - Dataiku Community As I understand it, the "Next page URL…
-
[Python API] Code envs doc shows unexisting functions
Hi, I am using Dataiku 9.0.5, Python 3.6 I'm trying to use the Python API to gather all the usages of a code env. In the documentation there is a method named list_usage which should do the trick, but when I try to use it in a notebook via Dataiku, I get the following error message AttributeError: 'DSSCodeEnv' object has…