Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

How to get projects settings

azamora
Level 1
Level 1
How to get projects settings

Hello everyone,

I am trying to build a report with all my projects on DSS.

I am using the python API and the list_project_keys() function to get all the projects and from there I am extracting the metadata of each project.

As a next step I want to investigate what other data I can to get from the get_settings() method.

I am able to get all projects settings objects but I don't know how to extract the data from there.

 

full_project = client.get_project(project['ABC'])
settings = full_project.get_settings()

 

Thanks !

0 Kudos
4 Replies
Marlan
Neuron
Neuron

Hi @azamora,

I built something similar to what you are describing recently. I pasted code below. I didn't try to capture everything but rather focused on key information. Maybe this would be an example you can pull from for your effort.

Marlan

# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function
import dataiku
import dataikuapi
import pandas as pd
import datetime


###############################################################
# Function & Class definitions
###############################################################


def get_dict_vals(d, keys):
    """Returns list of keys in same order as keys"""
    vals = []
    for key in keys:
        if key in d:
            val = d[key]
            if isinstance(val, list):
                vals.append(', '.join(val))
            else:
                vals.append(val)
        else:
            vals.append('')
    return vals

#get_dict_vals(project, get_keys)

def get_create_modify_vals(d):
    
    if u'creationTag' in d:
        cd = d[u'creationTag']
        cr_user = cd[u'lastModifiedBy'][u'login']
        cr_dtm = datetime.datetime.fromtimestamp(cd[u'lastModifiedOn'] / 1e3)
    else:
        cr_user = ''
        cr_dtm = pd.NaT

    md = d[u'versionTag']
    mod_user = md[u'lastModifiedBy'][u'login']
    mod_dtm = datetime.datetime.fromtimestamp(md[u'lastModifiedOn'] / 1e3)
    mod_vers_nbr = md[u'versionNumber']
    
    return [cr_user, cr_dtm, mod_user, mod_dtm, mod_vers_nbr]

#get_create_modify_vals(d)

###############################################################
# Set up
###############################################################

client = dataikuapi.DSSClient('http://xx.xx.xx.xx/', 'replace_this_with_api_key_with_admin_permission') # run as admin user for full access
projects_list = client.list_projects()

###############################################################
# Projects and entities included in Projects
###############################################################

proj_keys = [u'projectKey', u'name', u'projectStatus', u'shortDesc', u'ownerLogin', u'ownerDisplayName']

ds_main_keys = [u'projectKey', u'name', u'smartName', u'managed', u'type']
ds_params_keys = [u'connection', u'mode', u'table', u'tableCreationMode']

rcp_keys = [u'projectKey', u'name', u'type']

scen_main_keys = [u'projectKey', u'id', u'name', u'type', u'active']
scen_other_keys = [u'runAsUser', u'triggerCount', u'reporterCount', u'stepTypes']

create_modify_keys = [u'creationLogin', u'creationDateTime', u'modifyLogin', u'modifyDateTime', u'modifyVersionNumber']

project_attributes = []
dataset_attributes = []
recipe_attributes = []
scenario_attributes = []

for project_info in projects_list:

    project_key = project_info['projectKey']
    project = client.get_project(project_key)

    project_attributes.append(get_dict_vals(project_info, proj_keys))
    
    # Datasets
    for dataset_info in project.list_datasets():
        ds_main_vals = get_dict_vals(dataset_info, ds_main_keys)
        ds_params_vals = get_dict_vals(dataset_info['params'], ds_params_keys)
        ds_create_modify_vals = get_create_modify_vals(dataset_info)

        dataset_attributes.append(ds_main_vals + ds_params_vals + ds_create_modify_vals)
        
    # Recipes
    for recipe_info in project.list_recipes():
        rcp_vals = get_dict_vals(recipe_info, rcp_keys)
        rcp_create_modify_vals = get_create_modify_vals(recipe_info)

        recipe_attributes.append(rcp_vals + rcp_create_modify_vals)
    
    # Scenarios
    # Scenario items have short description but otherwise scenario objects
    # via settings has everything else
    for scenario in project.list_scenarios(as_type="objects"):
        settings = scenario.get_settings()
        scenario_info = settings.get_raw()
        
        # Main values
        scen_main_vals = get_dict_vals(scenario_info, scen_main_keys)
        
        # Other values
        effective_run_as = settings.effective_run_as # Pull run as user even if set to "last user"
        trigger_cnt = len(scenario_info['triggers']) # just count for now
        reporter_cnt = len(scenario_info['reporters']) # just count for now
        
        step_type_list = ''
        if 'params' in scenario_info:
            if 'steps' in scenario_info['params']:
                unique_step_types = set()
                for step in scenario_info['params']['steps']:
                    unique_step_types.add(step['type'])
                step_types = list(unique_step_types)
                step_types.sort()
                step_type_list = ', '.join(step_types)

        scen_other_vals = [effective_run_as, trigger_cnt, reporter_cnt, step_type_list]

        # Create/modify values
        scen_create_modify_vals = get_create_modify_vals(scenario_info)
        

        scenario_attributes.append(scen_main_vals + scen_other_vals + scen_create_modify_vals)

        
project_df = pd.DataFrame(project_attributes, columns=proj_keys)
dataset_df = pd.DataFrame(dataset_attributes, columns=ds_main_keys + ds_params_keys + create_modify_keys)
recipe_df = pd.DataFrame(recipe_attributes, columns=rcp_keys + create_modify_keys)
scenario_df = pd.DataFrame(scenario_attributes, columns=scen_main_keys + scen_other_keys + create_modify_keys)

###############################################################
# Users
###############################################################

user_keys = [u'login', u'displayName', u'email', u'enabled', u'groups', u'userProfile']

user_attributes = []
user_list = client.list_users()
for user_info in user_list:
    user_vals = get_dict_vals(user_info, user_keys)
    user_attributes.append(user_vals)

user_df = pd.DataFrame(user_attributes, columns=user_keys)


###############################################################
# Write dataframes to SQL tables
###############################################################

dataiku.Dataset("PROJECT").write_with_schema(project_df)
dataiku.Dataset("DATASET").write_with_schema(dataset_df)
dataiku.Dataset("RECIPE").write_with_schema(recipe_df)
dataiku.Dataset("SCENARIO").write_with_schema(scenario_df)
dataiku.Dataset("USER").write_with_schema(user_df)

 

0 Kudos
azamora
Level 1
Level 1
Author

Hi @Marlan 

Thanks a lot for the suggestion !

I tried to run your code (only changed client = dataiku.DSSClient() for client = dataiku.api_client() )

But I get the below error:

 
KeyErrorTraceback (most recent call last)
<ipython-input-6-fc187ab4367f> in <module>()
     87         ds_main_vals = get_dict_vals(dataset_info, ds_main_keys)
     88         ds_params_vals = get_dict_vals(dataset_info['params'], ds_params_keys)
---> 89         ds_create_modify_vals = get_create_modify_vals(dataset_info)
     90 
     91         dataset_attributes.append(ds_main_vals + ds_params_vals + ds_create_modify_vals)

<ipython-input-6-fc187ab4367f> in get_create_modify_vals(d)
     38         cr_dtm = pd.NaT
     39 
---> 40     md = d[u'versionTag']
     41     mod_user = md[u'lastModifiedBy'][u'login']
     42     mod_dtm = datetime.datetime.fromtimestamp(md[u'lastModifiedOn'] / 1e3)

KeyError: u'versionTag'

 

0 Kudos
Marlan
Neuron
Neuron

Hi @azamora,

Maybe some of your datasets don't have version tags. All of ours do so we didn't get an error like that.

You could treat the version tag logic like the creation tag logic. That is, "if u'versionTag' in d:" and so forth.

You may very well need to do some analysis of what you get back from getting the various settings. For example, you could print dataset_info for each dataset and see what the data is looking like. That's what I did.

I certainly can't guarantee that the logic will work in your setting as I built this all from examining our settings. Just wanted to share what I did as a starting point.

Marlan

0 Kudos
azamora
Level 1
Level 1
Author

Thanks alot @Marlan very helful !! 😁

0 Kudos
A banner prompting to get Dataiku DSS