Using Dataiku
- Hi Team, I have created scenario with consists of 5 steps. step 1: sql query to check latest data step 2: set project variable as execution start time using now() formula step 3: Build the dataset ste…Last answer by Turribeach
Project variables are not a good place to store run time data. Scenario variables fit that use case much better. However I feel your post doesn't give enough information of what exactly you are trying to achieve and seems to be trying to reinvent the wheel. Why are you recording start and end times of some partial scenario steps? This information is already stored by Dataiku and available both from Dataiku API and in the Internal Stats dataset.
Last answer by TurribeachProject variables are not a good place to store run time data. Scenario variables fit that use case much better. However I feel your post doesn't give enough information of what exactly you are trying to achieve and seems to be trying to reinvent the wheel. Why are you recording start and end times of some partial scenario steps? This information is already stored by Dataiku and available both from Dataiku API and in the Internal Stats dataset.
- I am surprised this is missing from the GUI but what really surprises me more is that it's not even shown in the logs.The fact that one needs to query the API to get this data should be a good indicat…
- is there any data architecture diagram available for dataiku which shows a complete project ???Last answer byLast answer by Turribeach
in data warehouse, you make a diagram that data will be fetched from different sources. all sources can be shown.
Exactly and since Dataiku can connect to any possible data source available in most companies it's imposible to make a data architecture diagram without knowing all your sources but also all your outputs. I think what you want is a general Dataiku Architecture diagram which your Dataiku Account Manager/Customer Success Manager should be able to provide.
- Hello, Does anyone have an idea why the "Dataset change" trigger on a scenario doesn't work for me? The dataset that I'm using is created out of files from Azure Blob Storage. Even though a new file a…Last answer by
- Hello, I noticed that when importing bundles to an automation node, environments are versioned and all old kernels are available for use in Jupyter notebooks, along with all the old environments. Now …Solution by
- How do I bring 4 rows of data into 1 row, 4 columns?Last answer byLast answer by me2
There are various ways to do that. The easiest way is to use a Prepare Recipe and add a step, Transpose.
https://doc.dataiku.com/dss/11/preparation/processors/transpose.html
Let us know if that doesn't meet your requirements.
- Hello, is there any options to limit audit logs saved locally? We sent our log via event-server to s3 (why these not sent to s3) but there log are generated in dss ui. I see this config in doc , does …Solution by
- We recently start to integrate Code Studio with our instance. And we use to save HD5 file in a managed folder, and use code like : with pd.HDFStore(pathToFile, mode='r', complevel=COMPRESSION_LEVEL, c…Last answer byLast answer by Zach
Hi @kai_fang,
pandas.HDFStore
requires a local file path, so in order to load if from a managed folder, we first need to copy it from the folder to a temporary local file.For example:
import os
import shutil
import tempfile import dataiku
import pandas as pd folder = dataiku.Folder("MY_FOLDER") with tempfile.TemporaryDirectory() as temp_dir:
temp_path = os.path.join(temp_dir, "temp.h5") # Copy the file from the remote folder to a temp local file with folder.get_download_stream("FILE_IN_FOLDER.h5") as folder_stream: with open(temp_path, "wb") as temp_stream: shutil.copyfileobj(folder_stream, temp_stream) # Load the temp HDF file with pd.HDFStore(temp_path, mode="r") as hdf: hdf_keys = hdf.keys() - Hi, In Dataiku, connection settings can be linked to user groups, but I believe they cannot be linked to projects. I have a question regarding this: If UserA belongs to both the Finance and HR user gr…Last answer byLast answer by Turribeach
I wasn't aware of this plugin and it is pretty cool that this can be supported in such way but in my view this makes the Administration of the security model a bigger overhead. Since the original poster is trying to prevent "unintended data mishandling" my approach would be to monitor projects using the Dataiku Python API which can easily detect which connections are being used in each project.
- Hi, Id like to ensure our code recipes follows secure coding practice by not putting secrets (API Keys, Passwords, Tokens) in the code recipe. Is there a way to do this from Dataiku? Assuming we don't…Last answer byLast answer by Turribeach
Hi, there are no built-in features for this but you can search inside recipe code using the Dataiku Python API:
import dataiku client_handle = dataiku.api_client() variables_to_search = ['var1', 'var2'] project_handle = client_handle.get_project('some project key') python_recipes = [i for i in project_handle.list_recipes() if i['type'] in ['python']] for python_recipe in python_recipes: recipe_name = python_recipe['name'] recipe_handle = project_handle.get_recipe(recipe_name) recipe_script = recipe_handle.get_settings().get_payload().lower() if recipe_script: for var in variables_to_search: if var.lower() in recipe_script: print(f'Found variable {var} in recipe {recipe_name}')
It shouldn't be too hard to customise this code to search for secrets. The following Python packages should help: