Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Help with Application setup

Solved!
DRamos
Level 1
Help with Application setup

Hi all!

I'm currently working on a very simple project that I need to deploy as an application. I'm very, very new to the platform, so I'm not sure how to even begin.

I already have a Flow that takes information from an input database, passes every record through an Interactive Decision Tree to label them, and then assigns the records randomly to specific usernames.

My goal is that the app allows the user to enter their username, and it displays the table of the records that were assigned to them after labeling as a table (with the possibility of exporting this table).

How can I proceed? Will this be easier to create in an Application or a Webapp within the project? 

Thanks in advance!

0 Kudos
1 Solution

Before progressing further with your solution you need to understand certain limitations of the Dataiku world. In Dataiku there is no support for running a flow, a scenario or a recipe concurrently. This means that "there can only be one" running at a single point of time. In general flows/scenarios/recipes are meant to run in "batch" mode in a scheduled basis, not on demand where an adhoc user can select an input and trigger an execution. It's perfectly possible to have a Dataiku WebApp that allows a user trigger a flow/scenario/recipe but this would only work for the first user that does it. If a subsequent user attemped to run the same flow/scenario/recipe while it is already running it will most likely fail (the actual outcome of this depends on which object you are trying to re-run). So before you continue further with your design you need to consider whether your solution needs to support concurrent runs or not. Sometimes you can get away with no concurrent runs and therefore you can use a flow/scenario/recipe design. For instance if you need to support a single user or if the user will only trigger a flow/scenario/recipe once a day. Where you can't use this design approach you need to move away from flow/scenario/recipe "batch" design and look at the API node and deploy an API service. Dataiku API Services allow developers to create APIs which can be used for "realtime" use cases, where you provide a set of inputs and you want to produce a unique output for that request. Typically these are used in scoring/prediction use cases like predicting the probability of default (ie when a customer applies for a credit card). The API node supports high availability and scalability which is something you will expect for a service that's expected to run concurrently and service multiple concurrent requests/users.

Now let's talk about Dataiku Applications. Dataiku Applications allow you to design and package your project as a reusable application with customizable inputs. While Dataiku Applications can run concurrently the use case Dataiku Applications is aimed at it's not the concurrent realtime use of a flow/scenario/recipe. Dataiku Applications allow to re-use a flow design into other projects so the use case is code reusability. As such in order to achive concurrency with Dataiku Applications you have to use the Dataiku Application into different projects. This is not something that fits your use case where you want to have a single project that serves user requests. I guess you could have a kludge solution where you convert your flow into a Dataiku Application and then you create multiple projects all using the same Dataiku Application (ie Project_Slot_1, Project_Slot_2, Project_Slot_3, Project_Slot_n) which allow you to execute your flow concurrently and have your WebApp sort of handle which application is being used at the moment and choose a "free slot" to execute the user inputs. But this seems like a bad pattern to me. 

So now that you understand how these solutions work you should clarify what your requirements are in terms of concurrency.

View solution in original post

0 Kudos
3 Replies
SarinaS
Dataiker

Hi @DRamos,

An application seems like a good approach for your use case! I would suggest walking through this tutorial for creating a visual application first, which should hopefully help clarify the basics of setting up an application: https://knowledge.dataiku.com/latest/mlops-o16n/dataiku-applications/dku-apps/tutorial-index.html. A webapp would also be reasonable, but applications are definitely built to allow users to input user-specific data, trigger a scenario in the flow, and then receive results, so for your use case I think an application would be the easiest approach.  

Feel free to post here if you have questions about specific parts of your application after walking through the tutorial. 

Thank you, 
Sarina

0 Kudos
DRamos
Level 1
Author

Thank you so much, Sarina! 

That tutorial was amazing, it really helped me understand the basics of the tool!

Just one more question: my application currently labels each record in a database and then allows the user to download the labeled file as .csv. Is there any way to display the table in the app?

0 Kudos

Before progressing further with your solution you need to understand certain limitations of the Dataiku world. In Dataiku there is no support for running a flow, a scenario or a recipe concurrently. This means that "there can only be one" running at a single point of time. In general flows/scenarios/recipes are meant to run in "batch" mode in a scheduled basis, not on demand where an adhoc user can select an input and trigger an execution. It's perfectly possible to have a Dataiku WebApp that allows a user trigger a flow/scenario/recipe but this would only work for the first user that does it. If a subsequent user attemped to run the same flow/scenario/recipe while it is already running it will most likely fail (the actual outcome of this depends on which object you are trying to re-run). So before you continue further with your design you need to consider whether your solution needs to support concurrent runs or not. Sometimes you can get away with no concurrent runs and therefore you can use a flow/scenario/recipe design. For instance if you need to support a single user or if the user will only trigger a flow/scenario/recipe once a day. Where you can't use this design approach you need to move away from flow/scenario/recipe "batch" design and look at the API node and deploy an API service. Dataiku API Services allow developers to create APIs which can be used for "realtime" use cases, where you provide a set of inputs and you want to produce a unique output for that request. Typically these are used in scoring/prediction use cases like predicting the probability of default (ie when a customer applies for a credit card). The API node supports high availability and scalability which is something you will expect for a service that's expected to run concurrently and service multiple concurrent requests/users.

Now let's talk about Dataiku Applications. Dataiku Applications allow you to design and package your project as a reusable application with customizable inputs. While Dataiku Applications can run concurrently the use case Dataiku Applications is aimed at it's not the concurrent realtime use of a flow/scenario/recipe. Dataiku Applications allow to re-use a flow design into other projects so the use case is code reusability. As such in order to achive concurrency with Dataiku Applications you have to use the Dataiku Application into different projects. This is not something that fits your use case where you want to have a single project that serves user requests. I guess you could have a kludge solution where you convert your flow into a Dataiku Application and then you create multiple projects all using the same Dataiku Application (ie Project_Slot_1, Project_Slot_2, Project_Slot_3, Project_Slot_n) which allow you to execute your flow concurrently and have your WebApp sort of handle which application is being used at the moment and choose a "free slot" to execute the user inputs. But this seems like a bad pattern to me. 

So now that you understand how these solutions work you should clarify what your requirements are in terms of concurrency.

0 Kudos