Extract Dataset Names under a TAG

Options
sj0071992
sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron

Hi Team,

Actually my Workflow is very Huge and i want to extract datasets name under a Tag, so is there any way to get the list of datasets under a TAG.

Thanks in Advance

Best Answer

  • dima_naboka
    dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
    edited July 17 Answer ✓
    Options

    Yes, you have several options. For example, save the output as Pandas dataframe and use pandas.DataFrame.to_excel().

    import dataiku
    import pandas as pd
    
    client = dataiku.api_client()
    project = client.get_project(dataiku.get_custom_variables()["projectKey"])
    datasets = project.list_datasets()
    
    
    result_dict = {'dataset':[],'tags':[]}
    for index in range(len(datasets)):
        if datasets[index]['tags']:
            result_dict['dataset'].append(datasets[index]['name'])
            result_dict['tags'].append(datasets[index]['tags'])
    df = pd.DataFrame(data=result_dict)
    df.to_excel('output1.xlsx')

    This will save XLSX file into DATA_DIR/jupyter-run/dku-workdirs/MY_PROJECT/recipe_name/ folder

    Screenshot 2021-10-19 at 17.02.12.png

    P.s. If you are running on older version of DSS or code env used to run the notebook uses legacy pandas==0.23 you will need to install xlsxwriter into corresponding code env and perform import xlsxwriter

Answers

  • dima_naboka
    dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
    edited July 17
    Options

    Hello,

    You can do this from Dataset menu in GUI

    Screenshot 2021-10-19 at 14.44.37.png

    as well as from a project's notebook

    import dataiku
    client = dataiku.api_client()
    project = client.get_project(dataiku.get_custom_variables()["projectKey"])
    datasets = project.list_datasets()
    
    tag_name = 'sql_dataset'
    
    for index in range(len(datasets)):
        if datasets[index]['tags']:
            if tag_name in datasets[index]['tags']:
                print "dataset '{}' is tagged with '{}'".format(datasets[index]['name'],tag_name)
    

    Screenshot 2021-10-19 at 14.49.19.png

  • sj0071992
    sj0071992 Partner, Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2022, Neuron 2023 Posts: 131 Neuron
    Options

    Hi,

    Can we get the dataset name and corresponding tag in an excel sheet?

Setup Info
    Tags
      Help me…