Extract Dataset Names under a TAG

Solved!
sj0071992
Extract Dataset Names under a TAG

Hi Team,

 

Actually my Workflow is very Huge and i want to extract datasets name under a Tag, so is there any way to get the list of datasets under a TAG.

 

Thanks in Advance

0 Kudos
1 Solution
dima_naboka
Dataiker

Yes, you have several options. For example, save the output as Pandas dataframe and use pandas.DataFrame.to_excel().

import dataiku
import pandas as pd

client = dataiku.api_client()
project = client.get_project(dataiku.get_custom_variables()["projectKey"])
datasets = project.list_datasets()


result_dict = {'dataset':[],'tags':[]}
for index in range(len(datasets)):
    if datasets[index]['tags']:
        result_dict['dataset'].append(datasets[index]['name'])
        result_dict['tags'].append(datasets[index]['tags'])
df = pd.DataFrame(data=result_dict)
df.to_excel('output1.xlsx')

This will save XLSX file into DATA_DIR/jupyter-run/dku-workdirs/MY_PROJECT/recipe_name/ folder

Screenshot 2021-10-19 at 17.02.12.png

P.s. If you are running on older version of DSS or code env used to run the notebook uses legacy pandas==0.23 you will need to install xlsxwriter into corresponding code env and perform import xlsxwriter

View solution in original post

0 Kudos
3 Replies
dima_naboka
Dataiker

Hello,

You can do this from Dataset menu in GUI

Screenshot 2021-10-19 at 14.44.37.png

as well as from a project's notebook

import dataiku
client = dataiku.api_client()
project = client.get_project(dataiku.get_custom_variables()["projectKey"])
datasets = project.list_datasets()

tag_name = 'sql_dataset'

for index in range(len(datasets)):
    if datasets[index]['tags']:
        if tag_name in datasets[index]['tags']:
            print "dataset '{}' is tagged with '{}'".format(datasets[index]['name'],tag_name)

Screenshot 2021-10-19 at 14.49.19.png 

 

0 Kudos
sj0071992
Author

Hi,

 

Can we get the dataset name and corresponding tag in an excel sheet?

0 Kudos
dima_naboka
Dataiker

Yes, you have several options. For example, save the output as Pandas dataframe and use pandas.DataFrame.to_excel().

import dataiku
import pandas as pd

client = dataiku.api_client()
project = client.get_project(dataiku.get_custom_variables()["projectKey"])
datasets = project.list_datasets()


result_dict = {'dataset':[],'tags':[]}
for index in range(len(datasets)):
    if datasets[index]['tags']:
        result_dict['dataset'].append(datasets[index]['name'])
        result_dict['tags'].append(datasets[index]['tags'])
df = pd.DataFrame(data=result_dict)
df.to_excel('output1.xlsx')

This will save XLSX file into DATA_DIR/jupyter-run/dku-workdirs/MY_PROJECT/recipe_name/ folder

Screenshot 2021-10-19 at 17.02.12.png

P.s. If you are running on older version of DSS or code env used to run the notebook uses legacy pandas==0.23 you will need to install xlsxwriter into corresponding code env and perform import xlsxwriter

0 Kudos