Using Dataiku

Sort by:
491 - 500 of 5.2k
  • Hi Team, I have created scenario with consists of 5 steps. step 1: sql query to check latest data step 2: set project variable as execution start time using now() formula step 3: Build the dataset ste…
    Answered
    Started by obireddy
    Most recent by Turribeach
    0
    1
    Turribeach
    Last answer by Turribeach

    Project variables are not a good place to store run time data. Scenario variables fit that use case much better. However I feel your post doesn't give enough information of what exactly you are trying to achieve and seems to be trying to reinvent the wheel. Why are you recording start and end times of some partial scenario steps? This information is already stored by Dataiku and available both from Dataiku API and in the Internal Stats dataset.

    Turribeach
    Last answer by Turribeach

    Project variables are not a good place to store run time data. Scenario variables fit that use case much better. However I feel your post doesn't give enough information of what exactly you are trying to achieve and seems to be trying to reinvent the wheel. Why are you recording start and end times of some partial scenario steps? This information is already stored by Dataiku and available both from Dataiku API and in the Internal Stats dataset.

  • I am surprised this is missing from the GUI but what really surprises me more is that it's not even shown in the logs.The fact that one needs to query the API to get this data should be a good indicat…
    Answered ✓
    Started by Turribeach
    Most recent by Turribeach
    2
    4
    apichery
    Solution by apichery

    @Turribeach This has been fixed in the upcoming 13.0.2 release.

    apichery
    Solution by apichery

    @Turribeach This has been fixed in the upcoming 13.0.2 release.

  • is there any data architecture diagram available for dataiku which shows a complete project ???
    Answered
    Started by BQ
    Most recent by Turribeach
    0
    3
    Last answer by
    Turribeach
    Last answer by Turribeach

    in data warehouse, you make a diagram that data will be fetched from different sources. all sources can be shown.

    Exactly and since Dataiku can connect to any possible data source available in most companies it's imposible to make a data architecture diagram without knowing all your sources but also all your outputs. I think what you want is a general Dataiku Architecture diagram which your Dataiku Account Manager/Customer Success Manager should be able to provide.

  • Hello, Does anyone have an idea why the "Dataset change" trigger on a scenario doesn't work for me? The dataset that I'm using is created out of files from Azure Blob Storage. Even though a new file a…
    Answered
    Started by Tomasz
    Most recent by Tomasz
    0
    2
    Last answer by
    Tomasz
    Last answer by Tomasz

    Yeah, sure! Here goes:

  • Hello, I noticed that when importing bundles to an automation node, environments are versioned and all old kernels are available for use in Jupyter notebooks, along with all the old environments. Now …
    Answered ✓
    Started by marawan
    Most recent by Cory
    1
    4
    Solution by
    Clément_Stenac
    Solution by Clément_Stenac

    Hi,

    Unfortunately, there is no API method for this, so you'll indeed need to do it via command-line. We'll take note of your request to inform future developments.

  • How do I bring 4 rows of data into 1 row, 4 columns?
    Answered
    Started by Ange
    Most recent by me2
    0
    1
    Last answer by
    me2
    Last answer by me2

    There are various ways to do that. The easiest way is to use a Prepare Recipe and add a step, Transpose.

    https://doc.dataiku.com/dss/11/preparation/processors/transpose.html

    Let us know if that doesn't meet your requirements.

  • Hello, is there any options to limit audit logs saved locally? We sent our log via event-server to s3 (why these not sent to s3) but there log are generated in dss ui. I see this config in doc , does …
    Answered ✓
    Started by Abdoulaye
    Most recent by Abdoulaye
    0
    4
    Solution by
    Turribeach
    Solution by Turribeach

    DSS rotates all logs automatically so you don't need to do anything here. It's worth having local logs in case something happens with your event server.

  • We recently start to integrate Code Studio with our instance. And we use to save HD5 file in a managed folder, and use code like : with pd.HDFStore(pathToFile, mode='r', complevel=COMPRESSION_LEVEL, c…
    Answered
    Started by kai_fang
    Most recent by Zach
    0
    1
    Last answer by
    Zach
    Last answer by Zach

    Hi @kai_fang,

    pandas.HDFStore requires a local file path, so in order to load if from a managed folder, we first need to copy it from the folder to a temporary local file.

    For example:

    import os
    import shutil
    import tempfile import dataiku
    import pandas as pd folder = dataiku.Folder("MY_FOLDER") with tempfile.TemporaryDirectory() as temp_dir:
    temp_path = os.path.join(temp_dir, "temp.h5") # Copy the file from the remote folder to a temp local file with folder.get_download_stream("FILE_IN_FOLDER.h5") as folder_stream: with open(temp_path, "wb") as temp_stream: shutil.copyfileobj(folder_stream, temp_stream) # Load the temp HDF file with pd.HDFStore(temp_path, mode="r") as hdf: hdf_keys = hdf.keys()
  • Hi, In Dataiku, connection settings can be linked to user groups, but I believe they cannot be linked to projects. I have a question regarding this: If UserA belongs to both the Finance and HR user gr…
    Answered
    Started by Tosihiro
    Most recent by Turribeach
    0
    3
    Last answer by
    Turribeach
    Last answer by Turribeach

    I wasn't aware of this plugin and it is pretty cool that this can be supported in such way but in my view this makes the Administration of the security model a bigger overhead. Since the original poster is trying to prevent "unintended data mishandling" my approach would be to monitor projects using the Dataiku Python API which can easily detect which connections are being used in each project.

  • Hi, Id like to ensure our code recipes follows secure coding practice by not putting secrets (API Keys, Passwords, Tokens) in the code recipe. Is there a way to do this from Dataiku? Assuming we don't…
    Answered
    Started by rdagumampan
    Most recent by Turribeach
    0
    1
    Last answer by
    Turribeach
    Last answer by Turribeach

    Hi, there are no built-in features for this but you can search inside recipe code using the Dataiku Python API:

    import dataiku
    
    client_handle = dataiku.api_client()
    variables_to_search = ['var1', 'var2']
    project_handle = client_handle.get_project('some project key')
    python_recipes = [i for i in project_handle.list_recipes() if i['type'] in ['python']]
    for python_recipe in python_recipes:
      recipe_name = python_recipe['name']
      recipe_handle = project_handle.get_recipe(recipe_name)
      recipe_script = recipe_handle.get_settings().get_payload().lower()
      if recipe_script:
        for var in variables_to_search:
          if var.lower() in recipe_script:
            print(f'Found variable {var} in recipe {recipe_name}')
    

    It shouldn't be too hard to customise this code to search for secrets. The following Python packages should help:

    https://pypi.org/project/ggshield/

    https://pypi.org/project/whispers/

491 - 500 of 5.2k50