How to get projects settings
Hello everyone,
I am trying to build a report with all my projects on DSS.
I am using the python API and the list_project_keys() function to get all the projects and from there I am extracting the metadata of each project.
As a next step I want to investigate what other data I can to get from the get_settings() method.
I am able to get all projects settings objects but I don't know how to extract the data from there.
full_project = client.get_project(project['ABC'])
settings = full_project.get_settings()
Thanks !
Answers
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron
Hi @azamora
,I built something similar to what you are describing recently. I pasted code below. I didn't try to capture everything but rather focused on key information. Maybe this would be an example you can pull from for your effort.
Marlan
# -*- coding: utf-8 -*- from __future__ import absolute_import, division, print_function import dataiku import dataikuapi import pandas as pd import datetime ############################################################### # Function & Class definitions ############################################################### def get_dict_vals(d, keys): """Returns list of keys in same order as keys""" vals = [] for key in keys: if key in d: val = d[key] if isinstance(val, list): vals.append(', '.join(val)) else: vals.append(val) else: vals.append('') return vals #get_dict_vals(project, get_keys) def get_create_modify_vals(d): if u'creationTag' in d: cd = d[u'creationTag'] cr_user = cd[u'lastModifiedBy'][u'login'] cr_dtm = datetime.datetime.fromtimestamp(cd[u'lastModifiedOn'] / 1e3) else: cr_user = '' cr_dtm = pd.NaT md = d[u'versionTag'] mod_user = md[u'lastModifiedBy'][u'login'] mod_dtm = datetime.datetime.fromtimestamp(md[u'lastModifiedOn'] / 1e3) mod_vers_nbr = md[u'versionNumber'] return [cr_user, cr_dtm, mod_user, mod_dtm, mod_vers_nbr] #get_create_modify_vals(d) ############################################################### # Set up ############################################################### client = dataikuapi.DSSClient('http://xx.xx.xx.xx/', 'replace_this_with_api_key_with_admin_permission') # run as admin user for full access projects_list = client.list_projects() ############################################################### # Projects and entities included in Projects ############################################################### proj_keys = [u'projectKey', u'name', u'projectStatus', u'shortDesc', u'ownerLogin', u'ownerDisplayName'] ds_main_keys = [u'projectKey', u'name', u'smartName', u'managed', u'type'] ds_params_keys = [u'connection', u'mode', u'table', u'tableCreationMode'] rcp_keys = [u'projectKey', u'name', u'type'] scen_main_keys = [u'projectKey', u'id', u'name', u'type', u'active'] scen_other_keys = [u'runAsUser', u'triggerCount', u'reporterCount', u'stepTypes'] create_modify_keys = [u'creationLogin', u'creationDateTime', u'modifyLogin', u'modifyDateTime', u'modifyVersionNumber'] project_attributes = [] dataset_attributes = [] recipe_attributes = [] scenario_attributes = [] for project_info in projects_list: project_key = project_info['projectKey'] project = client.get_project(project_key) project_attributes.append(get_dict_vals(project_info, proj_keys)) # Datasets for dataset_info in project.list_datasets(): ds_main_vals = get_dict_vals(dataset_info, ds_main_keys) ds_params_vals = get_dict_vals(dataset_info['params'], ds_params_keys) ds_create_modify_vals = get_create_modify_vals(dataset_info) dataset_attributes.append(ds_main_vals + ds_params_vals + ds_create_modify_vals) # Recipes for recipe_info in project.list_recipes(): rcp_vals = get_dict_vals(recipe_info, rcp_keys) rcp_create_modify_vals = get_create_modify_vals(recipe_info) recipe_attributes.append(rcp_vals + rcp_create_modify_vals) # Scenarios # Scenario items have short description but otherwise scenario objects # via settings has everything else for scenario in project.list_scenarios(as_type="objects"): settings = scenario.get_settings() scenario_info = settings.get_raw() # Main values scen_main_vals = get_dict_vals(scenario_info, scen_main_keys) # Other values effective_run_as = settings.effective_run_as # Pull run as user even if set to "last user" trigger_cnt = len(scenario_info['triggers']) # just count for now reporter_cnt = len(scenario_info['reporters']) # just count for now step_type_list = '' if 'params' in scenario_info: if 'steps' in scenario_info['params']: unique_step_types = set() for step in scenario_info['params']['steps']: unique_step_types.add(step['type']) step_types = list(unique_step_types) step_types.sort() step_type_list = ', '.join(step_types) scen_other_vals = [effective_run_as, trigger_cnt, reporter_cnt, step_type_list] # Create/modify values scen_create_modify_vals = get_create_modify_vals(scenario_info) scenario_attributes.append(scen_main_vals + scen_other_vals + scen_create_modify_vals) project_df = pd.DataFrame(project_attributes, columns=proj_keys) dataset_df = pd.DataFrame(dataset_attributes, columns=ds_main_keys + ds_params_keys + create_modify_keys) recipe_df = pd.DataFrame(recipe_attributes, columns=rcp_keys + create_modify_keys) scenario_df = pd.DataFrame(scenario_attributes, columns=scen_main_keys + scen_other_keys + create_modify_keys) ############################################################### # Users ############################################################### user_keys = [u'login', u'displayName', u'email', u'enabled', u'groups', u'userProfile'] user_attributes = [] user_list = client.list_users() for user_info in user_list: user_vals = get_dict_vals(user_info, user_keys) user_attributes.append(user_vals) user_df = pd.DataFrame(user_attributes, columns=user_keys) ############################################################### # Write dataframes to SQL tables ############################################################### dataiku.Dataset("PROJECT").write_with_schema(project_df) dataiku.Dataset("DATASET").write_with_schema(dataset_df) dataiku.Dataset("RECIPE").write_with_schema(recipe_df) dataiku.Dataset("SCENARIO").write_with_schema(scenario_df) dataiku.Dataset("USER").write_with_schema(user_df)
-
Hi @Marlan
Thanks a lot for the suggestion !
I tried to run your code (only changed client = dataiku.DSSClient() for client = dataiku.api_client() )
But I get the below error:
KeyErrorTraceback (most recent call last) <ipython-input-6-fc187ab4367f> in <module>() 87 ds_main_vals = get_dict_vals(dataset_info, ds_main_keys) 88 ds_params_vals = get_dict_vals(dataset_info['params'], ds_params_keys) ---> 89 ds_create_modify_vals = get_create_modify_vals(dataset_info) 90 91 dataset_attributes.append(ds_main_vals + ds_params_vals + ds_create_modify_vals) <ipython-input-6-fc187ab4367f> in get_create_modify_vals(d) 38 cr_dtm = pd.NaT 39 ---> 40 md = d[u'versionTag'] 41 mod_user = md[u'lastModifiedBy'][u'login'] 42 mod_dtm = datetime.datetime.fromtimestamp(md[u'lastModifiedOn'] / 1e3) KeyError: u'versionTag'
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron
Hi @azamora
,Maybe some of your datasets don't have version tags. All of ours do so we didn't get an error like that.
You could treat the version tag logic like the creation tag logic. That is, "if u'versionTag' in d:" and so forth.
You may very well need to do some analysis of what you get back from getting the various settings. For example, you could print dataset_info for each dataset and see what the data is looking like. That's what I did.
I certainly can't guarantee that the logic will work in your setting as I built this all from examining our settings. Just wanted to share what I did as a starting point.
Marlan
-
Thanks alot @Marlan
very helful !!