How to get projects settings

azamora
azamora Partner, Registered Posts: 9 Partner

Hello everyone,

I am trying to build a report with all my projects on DSS.

I am using the python API and the list_project_keys() function to get all the projects and from there I am extracting the metadata of each project.

As a next step I want to investigate what other data I can to get from the get_settings() method.

I am able to get all projects settings objects but I don't know how to extract the data from there.

full_project = client.get_project(project['ABC'])
settings = full_project.get_settings()

Thanks !

Answers

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron
    edited July 17

    Hi @azamora
    ,

    I built something similar to what you are describing recently. I pasted code below. I didn't try to capture everything but rather focused on key information. Maybe this would be an example you can pull from for your effort.

    Marlan

    # -*- coding: utf-8 -*-
    from __future__ import absolute_import, division, print_function
    import dataiku
    import dataikuapi
    import pandas as pd
    import datetime
    
    
    ###############################################################
    # Function & Class definitions
    ###############################################################
    
    
    def get_dict_vals(d, keys):
        """Returns list of keys in same order as keys"""
        vals = []
        for key in keys:
            if key in d:
                val = d[key]
                if isinstance(val, list):
                    vals.append(', '.join(val))
                else:
                    vals.append(val)
            else:
                vals.append('')
        return vals
    
    #get_dict_vals(project, get_keys)
    
    def get_create_modify_vals(d):
        
        if u'creationTag' in d:
            cd = d[u'creationTag']
            cr_user = cd[u'lastModifiedBy'][u'login']
            cr_dtm = datetime.datetime.fromtimestamp(cd[u'lastModifiedOn'] / 1e3)
        else:
            cr_user = ''
            cr_dtm = pd.NaT
    
        md = d[u'versionTag']
        mod_user = md[u'lastModifiedBy'][u'login']
        mod_dtm = datetime.datetime.fromtimestamp(md[u'lastModifiedOn'] / 1e3)
        mod_vers_nbr = md[u'versionNumber']
        
        return [cr_user, cr_dtm, mod_user, mod_dtm, mod_vers_nbr]
    
    #get_create_modify_vals(d)
    
    ###############################################################
    # Set up
    ###############################################################
    
    client = dataikuapi.DSSClient('http://xx.xx.xx.xx/', 'replace_this_with_api_key_with_admin_permission') # run as admin user for full access
    projects_list = client.list_projects()
    
    ###############################################################
    # Projects and entities included in Projects
    ###############################################################
    
    proj_keys = [u'projectKey', u'name', u'projectStatus', u'shortDesc', u'ownerLogin', u'ownerDisplayName']
    
    ds_main_keys = [u'projectKey', u'name', u'smartName', u'managed', u'type']
    ds_params_keys = [u'connection', u'mode', u'table', u'tableCreationMode']
    
    rcp_keys = [u'projectKey', u'name', u'type']
    
    scen_main_keys = [u'projectKey', u'id', u'name', u'type', u'active']
    scen_other_keys = [u'runAsUser', u'triggerCount', u'reporterCount', u'stepTypes']
    
    create_modify_keys = [u'creationLogin', u'creationDateTime', u'modifyLogin', u'modifyDateTime', u'modifyVersionNumber']
    
    project_attributes = []
    dataset_attributes = []
    recipe_attributes = []
    scenario_attributes = []
    
    for project_info in projects_list:
    
        project_key = project_info['projectKey']
        project = client.get_project(project_key)
    
        project_attributes.append(get_dict_vals(project_info, proj_keys))
        
        # Datasets
        for dataset_info in project.list_datasets():
            ds_main_vals = get_dict_vals(dataset_info, ds_main_keys)
            ds_params_vals = get_dict_vals(dataset_info['params'], ds_params_keys)
            ds_create_modify_vals = get_create_modify_vals(dataset_info)
    
            dataset_attributes.append(ds_main_vals + ds_params_vals + ds_create_modify_vals)
            
        # Recipes
        for recipe_info in project.list_recipes():
            rcp_vals = get_dict_vals(recipe_info, rcp_keys)
            rcp_create_modify_vals = get_create_modify_vals(recipe_info)
    
            recipe_attributes.append(rcp_vals + rcp_create_modify_vals)
        
        # Scenarios
        # Scenario items have short description but otherwise scenario objects
        # via settings has everything else
        for scenario in project.list_scenarios(as_type="objects"):
            settings = scenario.get_settings()
            scenario_info = settings.get_raw()
            
            # Main values
            scen_main_vals = get_dict_vals(scenario_info, scen_main_keys)
            
            # Other values
            effective_run_as = settings.effective_run_as # Pull run as user even if set to "last user"
            trigger_cnt = len(scenario_info['triggers']) # just count for now
            reporter_cnt = len(scenario_info['reporters']) # just count for now
            
            step_type_list = ''
            if 'params' in scenario_info:
                if 'steps' in scenario_info['params']:
                    unique_step_types = set()
                    for step in scenario_info['params']['steps']:
                        unique_step_types.add(step['type'])
                    step_types = list(unique_step_types)
                    step_types.sort()
                    step_type_list = ', '.join(step_types)
    
            scen_other_vals = [effective_run_as, trigger_cnt, reporter_cnt, step_type_list]
    
            # Create/modify values
            scen_create_modify_vals = get_create_modify_vals(scenario_info)
            
    
            scenario_attributes.append(scen_main_vals + scen_other_vals + scen_create_modify_vals)
    
            
    project_df = pd.DataFrame(project_attributes, columns=proj_keys)
    dataset_df = pd.DataFrame(dataset_attributes, columns=ds_main_keys + ds_params_keys + create_modify_keys)
    recipe_df = pd.DataFrame(recipe_attributes, columns=rcp_keys + create_modify_keys)
    scenario_df = pd.DataFrame(scenario_attributes, columns=scen_main_keys + scen_other_keys + create_modify_keys)
    
    ###############################################################
    # Users
    ###############################################################
    
    user_keys = [u'login', u'displayName', u'email', u'enabled', u'groups', u'userProfile']
    
    user_attributes = []
    user_list = client.list_users()
    for user_info in user_list:
        user_vals = get_dict_vals(user_info, user_keys)
        user_attributes.append(user_vals)
    
    user_df = pd.DataFrame(user_attributes, columns=user_keys)
    
    
    ###############################################################
    # Write dataframes to SQL tables
    ###############################################################
    
    dataiku.Dataset("PROJECT").write_with_schema(project_df)
    dataiku.Dataset("DATASET").write_with_schema(dataset_df)
    dataiku.Dataset("RECIPE").write_with_schema(recipe_df)
    dataiku.Dataset("SCENARIO").write_with_schema(scenario_df)
    dataiku.Dataset("USER").write_with_schema(user_df)

  • azamora
    azamora Partner, Registered Posts: 9 Partner
    edited July 17

    Hi @Marlan

    Thanks a lot for the suggestion !

    I tried to run your code (only changed client = dataiku.DSSClient() for client = dataiku.api_client() )

    But I get the below error:

    KeyErrorTraceback (most recent call last)
    <ipython-input-6-fc187ab4367f> in <module>()
         87         ds_main_vals = get_dict_vals(dataset_info, ds_main_keys)
         88         ds_params_vals = get_dict_vals(dataset_info['params'], ds_params_keys)
    ---> 89         ds_create_modify_vals = get_create_modify_vals(dataset_info)
         90 
         91         dataset_attributes.append(ds_main_vals + ds_params_vals + ds_create_modify_vals)
    
    <ipython-input-6-fc187ab4367f> in get_create_modify_vals(d)
         38         cr_dtm = pd.NaT
         39 
    ---> 40     md = d[u'versionTag']
         41     mod_user = md[u'lastModifiedBy'][u'login']
         42     mod_dtm = datetime.datetime.fromtimestamp(md[u'lastModifiedOn'] / 1e3)
    
    KeyError: u'versionTag'

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron

    Hi @azamora
    ,

    Maybe some of your datasets don't have version tags. All of ours do so we didn't get an error like that.

    You could treat the version tag logic like the creation tag logic. That is, "if u'versionTag' in d:" and so forth.

    You may very well need to do some analysis of what you get back from getting the various settings. For example, you could print dataset_info for each dataset and see what the data is looking like. That's what I did.

    I certainly can't guarantee that the logic will work in your setting as I built this all from examining our settings. Just wanted to share what I did as a starting point.

    Marlan

  • azamora
    azamora Partner, Registered Posts: 9 Partner

    Thanks alot @Marlan
    very helful !!

Setup Info
    Tags
      Help me…