Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on August 19, 2021 8:48AM
Likes: 0
Replies: 2
Hi there !
So I've been working on my own macro to do this, and with our upgrade from v7 to v9 I saw that a new builtin macro is available and allow users to delete datasets using tags.
How about adding a similar macro to clear the data of datasets using tags ? The macro "clear intermediate datasets" is fine for simple projects, but it has limited options when working with more complex, multi-leveled projects (example w/ multi source data prep, training, deploy and prediction). It would be rather useful for users that work in a big data environment.
Tell me what you think !
Cheers,
Pierre
Hi Pierre,
It would definitely be possible for you to create such a macro ! You can retrieve the tagging information using the Dataset API, apply some filtering and then clear the relevant Dataset(s). The code would look like this:
import dataiku client = dataiku.api_client() project = client.get_project("MY_PROJECT_KEY") datasets = project.list_datasets() for dataset in project.list_datasets(): if "MY_TAG" in dataset["tags"]: print("Clearing dataset {}".format(dataset["name"])) dataset.clear()
To insert that in a more polished format, you can take inspiration from the "Clear intermediate datasets" macro source code.
Best,
Harizo