Add "clear dataset data by tag" macro similar to "delete datasets by tag"

rnorm
rnorm Registered Posts: 9 ✭✭✭✭

Hi there !

So I've been working on my own macro to do this, and with our upgrade from v7 to v9 I saw that a new builtin macro is available and allow users to delete datasets using tags.

How about adding a similar macro to clear the data of datasets using tags ? The macro "clear intermediate datasets" is fine for simple projects, but it has limited options when working with more complex, multi-leveled projects (example w/ multi source data prep, training, deploy and prediction). It would be rather useful for users that work in a big data environment.

Tell me what you think !

Cheers,

Pierre

0
0 votes

Rejected · Last Updated

Comments

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    edited July 17

    Hi Pierre,

    It would definitely be possible for you to create such a macro ! You can retrieve the tagging information using the Dataset API, apply some filtering and then clear the relevant Dataset(s). The code would look like this:

    import dataiku
    
    client = dataiku.api_client()
    project = client.get_project("MY_PROJECT_KEY")
    
    datasets = project.list_datasets()
    
    for dataset in project.list_datasets():
        if "MY_TAG" in dataset["tags"]:
            print("Clearing dataset {}".format(dataset["name"]))
            dataset.clear()
    
    

    To insert that in a more polished format, you can take inspiration from the "Clear intermediate datasets" macro source code.

    Best,

    Harizo

  • Ashley
    Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 161 Dataiker

    Hi @rnorm
    ,

    HarizoR's quick start is a good one ! We'll mark this one as Not Planned.

    Best,

    Ashley

Setup Info
    Tags
      Help me…