Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Auditing projects by tags and statuses

Solved!
gskoff
Level 2
Auditing projects by tags and statuses

Once we properly tag projects (using global tags), what options do we have as an admin to audit which projects have certain tags? Certain project statuses? As a user, we can search by these in the project-list page, but is there any way to export/share/archive the results? Thanks!

0 Kudos
1 Solution
Turribeach

This should do:

 

import datetime
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

client = dataiku.api_client()
project_keys = client.list_project_keys()

df_project_data = pd.DataFrame(columns=['project_key', 'project_name', 'project_status', 'project_tags', 'has_active_scenarios', 'project_owner', 'project_created_by', 'project_created_on', 'project_last_modified_on'])

for project_key in project_keys:
    project = client.get_project(project_key)
    all_scenarios = project.list_scenarios()
    project_status = project.get_settings().get_raw().get('projectStatus')
    project_summary = project.get_summary()
    project_name = project_summary['name']
    project_owner = project_summary['ownerLogin']
    if project_summary.get('creationTag', ''):
        project_created_by = project_summary['creationTag']['lastModifiedBy']['login']
        project_created_on = datetime.datetime.utcfromtimestamp(int(project_summary['creationTag']['lastModifiedOn']) / 1000).strftime("%d-%b-%Y %H:%M:%S")
    else:
        project_created_by = ''
        project_created_on = ''
    project_last_modified_on = datetime.datetime.utcfromtimestamp(int(project_summary['versionTag']['lastModifiedOn']) / 1000).strftime("%d-%b-%Y %H:%M:%S")
    project_tags = list(project.get_tags()['tags'].keys())
    has_active_scenarios = ''
    for scenario in all_scenarios:
        scn_id = scenario['id']
        scn_settings = project.get_scenario(scenario.get('id')).get_settings()
        scn_settings_raw = scn_settings.get_raw()
        if scn_settings_raw['active'] == True:
            active_triggers = [x for x in scn_settings_raw.get('triggers') if x['active'] == True]
            if active_triggers:
                has_active_scenarios = 'Yes'
                break
    data_record = pd.DataFrame.from_dict({'project_key': [project_key], 'project_name': [project_name], 'project_status': [project_status], 'project_tags': [project_tags], 'has_active_scenarios': [has_active_scenarios],
                                          'project_owner': [project_owner], 'project_created_by': [project_created_by], 'project_created_on': [project_created_on], 'project_last_modified_on': [project_last_modified_on]})
    df_project_data = pd.concat([df_project_data, data_record], ignore_index=True)

 

View solution in original post

5 Replies
Turribeach

Project tags can be searched using the top right Search DSS box. Search for a tag, then click on the DSS Items tag and then you can search projects by tag (see below). I am not aware of any way to search by project status. But in any case both projects tags and status are available in the Python API so it's fairly easy to build a dataset with all of them and then use the Explore, Dashboards, export to Excel to do any filtering you desire.

Screenshot 2024-01-22 at 19.06.59.png

0 Kudos
gskoff
Level 2
Author

Thanks for the reply. Are there any example scripts available to pull the metadata for the projects on an instance using the Python API? Project name, tags, status, project creator, create date, last update date?

0 Kudos
Turribeach

This should do:

 

import datetime
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

client = dataiku.api_client()
project_keys = client.list_project_keys()

df_project_data = pd.DataFrame(columns=['project_key', 'project_name', 'project_status', 'project_tags', 'has_active_scenarios', 'project_owner', 'project_created_by', 'project_created_on', 'project_last_modified_on'])

for project_key in project_keys:
    project = client.get_project(project_key)
    all_scenarios = project.list_scenarios()
    project_status = project.get_settings().get_raw().get('projectStatus')
    project_summary = project.get_summary()
    project_name = project_summary['name']
    project_owner = project_summary['ownerLogin']
    if project_summary.get('creationTag', ''):
        project_created_by = project_summary['creationTag']['lastModifiedBy']['login']
        project_created_on = datetime.datetime.utcfromtimestamp(int(project_summary['creationTag']['lastModifiedOn']) / 1000).strftime("%d-%b-%Y %H:%M:%S")
    else:
        project_created_by = ''
        project_created_on = ''
    project_last_modified_on = datetime.datetime.utcfromtimestamp(int(project_summary['versionTag']['lastModifiedOn']) / 1000).strftime("%d-%b-%Y %H:%M:%S")
    project_tags = list(project.get_tags()['tags'].keys())
    has_active_scenarios = ''
    for scenario in all_scenarios:
        scn_id = scenario['id']
        scn_settings = project.get_scenario(scenario.get('id')).get_settings()
        scn_settings_raw = scn_settings.get_raw()
        if scn_settings_raw['active'] == True:
            active_triggers = [x for x in scn_settings_raw.get('triggers') if x['active'] == True]
            if active_triggers:
                has_active_scenarios = 'Yes'
                break
    data_record = pd.DataFrame.from_dict({'project_key': [project_key], 'project_name': [project_name], 'project_status': [project_status], 'project_tags': [project_tags], 'has_active_scenarios': [has_active_scenarios],
                                          'project_owner': [project_owner], 'project_created_by': [project_created_by], 'project_created_on': [project_created_on], 'project_last_modified_on': [project_last_modified_on]})
    df_project_data = pd.concat([df_project_data, data_record], ignore_index=True)

 

gskoff
Level 2
Author

THANK YOU! This worked almost flawlessly for me. The only minor issue I ran into was that the project_summary['creationTag'] didn't exist for at least one project on our instance. I got around that with a try/except and gave it a default value on the exception.

0 Kudos
Turribeach

You are welcome. I fixed the creationTag issue with an If. It happens for very old projects that don't have the creationTag and it looks like Dataiku did not bother to at least create the tag when the projects were migrated.

Labels

?
Labels (1)

Setup info

?
A banner prompting to get Dataiku