Add "clear dataset data by tag" macro similar to "delete datasets by tag"
Hi there !
So I've been working on my own macro to do this, and with our upgrade from v7 to v9 I saw that a new builtin macro is available and allow users to delete datasets using tags.
How about adding a similar macro to clear the data of datasets using tags ? The macro "clear intermediate datasets" is fine for simple projects, but it has limited options when working with more complex, multi-leveled projects (example w/ multi source data prep, training, deploy and prediction). It would be rather useful for users that work in a big data environment.
Tell me what you think !
Cheers,
Pierre
Comments
-
Hi Pierre,
It would definitely be possible for you to create such a macro ! You can retrieve the tagging information using the Dataset API, apply some filtering and then clear the relevant Dataset(s). The code would look like this:
import dataiku client = dataiku.api_client() project = client.get_project("MY_PROJECT_KEY") datasets = project.list_datasets() for dataset in project.list_datasets(): if "MY_TAG" in dataset["tags"]: print("Clearing dataset {}".format(dataset["name"])) dataset.clear()
To insert that in a more polished format, you can take inspiration from the "Clear intermediate datasets" macro source code.
Best,
Harizo
-
Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 162 Dataiker