How to collect user-provided info in an application

Jason
Jason Registered Posts: 42 ✭✭✭✭✭

I'm looking for tips/advice on how best to set up some user data collection as part of an application, and how it will interact with my flow/partitioning scheme. Here's the basic description of what is going on:

I've built a notebook and accompanying recipe that performs a Monte Carlo simulation. I'm hoping to create an application through which users can trigger the simulation, and later view the results of that simulation. The simulation looks at optimizing the use of an item that is the rate-limiting factor in a business process.

The user will need to input several variables that control the fixed and variable parts of the simulation, and these values will then be used to run the simulation. The partitioning scheme for the datasets is based on the internal stock number for the item in question.

So my question then is this: What is the best mechanism to capture all the variables related to the user's desired simulation (things like day of week, upper and lower limits etc.) and make them available to the simulation script. I think I want them to be persistent so that if the user wants to re-run the simulation they could load a previous config and modify it. I also want the application to deny a new run for the same item if the simulation for that item is already running, but allow runs for other items. (simulation takes about an hour). I need those values to be passed somehow to the python code that runs the simulation. I'm thinking of just using the application to capture data into an editable dataset, then trigger the scenario with the correct partition ID… then use the "set scenario variables" feature to load the parameters into the scenario object, which would then be accessible to the simulation script. This will all work nicely because the output data would then be written into corresponding partitions.

Am I going about this the right way or is there a much better pattern to follow? I'm anticipating that I can use reader licenses to grant access to users of this application. Is that the best way to manage access?

Thanks,

-Jason

Comments

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,712 Neuron
    edited June 19

    What is the best mechanism to capture all the variables related to the user's desired simulation (things like day of week, upper and lower limits etc.) and make them available to the simulation script.

    For this use case I would normally use a Dataiku Webapp. This used to take some time to code but with GenAI now you can quickly knock off something really good in no time. Have a look at the documentation sample if you want to get a quick sample working.

    I think I want them to be persistent so that if the user wants to re-run the simulation they could load a previous config and modify it.

    Easy to do on a webapp as you can write back to a dataset to keep track of the values users use.

    I also want the application to deny a new run for the same item if the simulation for that item is already running, but allow runs for other items.

    Denying the run for a simulation that is already running should no problem as you can track them. But do note while webapps are "multi-user compatible" scenarios are not. A scenario can only run a single execution at a time. So if you use scenarios you will not be able to have parallel runs.

    I'm thinking of just using the application to capture data into an editable dataset

    Editable datasets can't be modified using the API so just use a regular dataset and write it back via Python.

    Then use the "set scenario variables" feature to load the parameters into the scenario object.

    Personally I prefer scenario parameters because you get them directly added as variables without extra scenario steps. I covered scenario parameters at lengh at this community post. You may want to define your scenario variables as project variables to prevent errors when executing recipes running outside of scenarios.

    I'm anticipating that I can use reader licenses to grant access to users of this application. Is that the best way to manage access

    This should be no problem. You can run the Dataiku Webapp with a higher permissions account which has permissions to execute the scenario, write data in the project, etc. Reader users will only need read project permissions to execute the webapp in the project. You can also create a vanity URL for the webapp to make it more user friendly and remove all the DSS framing from the webapp. Make sure you set the webapp to authenticated as you don't want it to be public. You could also fetch the user identity inside the webapp and do whatever additional permissions checks you may want.

Setup Info
    Tags
      Help me…