Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I've been working on a number of file systems to process data about the files on various disks. The flow zone I've created looks like this.
There are two simple shell scripts that gather data about the same file volume, two preparation recipes that clean up that data, and 1 Join recipe that brings both sets of data together about the one file system.
For each file system I'm processing I'm creating a new flow zone. With the same steps. I have to repeat this for a number of volumes attached to my host. And these volumes will change over time. Therefore the only real difference is the path where the files come from will change.
Part of the reason to breath the flow zone into two paths at the beginning is that I want to be able to control when each of these recipes runs. Because the MD5 shell scripts take about 3/4 of an hour for every 100 GB that I need to process. Where pulling the stats about files takes 1/10 the time.
My question for the community. Has anyone worked out a way to re-use a single flow zone for multiple datasets? This would save a bunch of time copying flow zones. It would save time managing multiple flow zones, and increase reliability because all datasets would get the same processing.
Operating system used: Mac OS 10.15.7
Thank you so much for the post on Community.
DSS has the "Application-as-recipe" feature, which packages a project's flow into a reusable recipe for other datasets. By defining a flow in a project and converting it into an Application-as-recipe, you can reuse the flow for other datasets. This feature might be useful for your use case. Please see this DSS document https://doc.dataiku.com/dss/latest/applications/application-as-recipe.html for the details of this feature.
I hope this would help.
Keiji, Dataiku Technical Support
Application-as-recipe looks promising. However, I feel like I need a few more details about how to use this feature set. Is there any training material or more detailed examples on the use of this feature?
Oh, I think I may have found some additional information that might be useful to me to get started. https://knowledge.dataiku.com/latest/courses/o16n/dataiku-applications/create-app-as-recipe.html
@tgb417 Yes, you can refer to the knowledge base article you mentioned for the details of the feature. Please let us know if you have any further questions regarding the feature.
Keiji, Dataiku Technical Support
How do code updates work in the Application-as-recipe senario.
I create the application-as-recipe. I put it into production. Now I find a bug in the application or a case that did not work as expected. How do I update that application-as-recipe?
I’ve worked my way through the knowledge base article on application as recipe, and I have been able to make it work as written. However, this is a very specific multi step recipe on making a section of the haiku t-shirt example into a reusable component.
However, I’m having a hard time generalizing this specific set of instructions into a better understanding of Dataiku applications and more specifically application as recipe so that I can create my own to do the kind of thing I’d like to do. Is there a more general set of instructional material on this subject that would scaffold the purpose for taking each of the steps listed? With an eye on being able to create other such recipes.