as soon as one develops a report for the business, it might be the case that this kicks off even more of demand.
So instead of a static report it becomes dynamic with sliders, filters and all. This is all currently possible with Shiny, Bokeh or JS. But what if inputs should be allowed in a concurrent way by various users?
Does anyone have ideas how to persist small datasets from a webapp without droping and creating complete datasets on File storages like GlusterFS or Hadoop and staying in the Dataiku ecosystem (e.g. not dropping out by having a dedicated non-attached SQL DB)?
We have something coming in the next release of Dataiku to create applications for business users without coding everything. Stay tuned!
thanks @jereze for the quick heads-up!
Sounds like you guys are working on RAD. Nice!
Even though my question was more related if you guys have any ideas how to modify a dataset from a webapp without recreating it from scratch all the time whenever there is a modification. E.g. from python, r, js you submit an api call and this directly modifies a row in a dataset. Currently neither the internal nor the public api support this.
Imagine you render an editable customer order table in the web ui on basis of a dataiku dataset and the user modifies a row (one or more columns). How can this be now persisted back to the dataset?
Imho this would be a superb feature for CRUD applications and would kill the need for additional persistent storages outside of Dataiku.
Dataiku is built around the philosophy of not modifying dataset but instead apply recipes and save outputs (the Flow).
I can suggest you some options:
Thanks Jeremy for your suggestions.
Unfortunately none of them would work for us.
Option 1: In our case would have multiple concurrent user transactions from the web app. This solution proposal could be quite challenging
Option 2a (SQL via Python API): Doesn't work for non-selects, in our case we would want to execute INSERTS which this api doesn't allow (if it was even sql based like referred in the next point)
Option 2b : Unfortunately it's not sql based in our case (GlusterFS)
Option 3: This was our main intent to avoid this option as this has limits when it comes to performance and big data
I understand you have a design philosophy around Datasets but have you guys thought about enriching the APIs to directly modify it? Real CRUD directly on the data no matter which underlying infrastructure is so essential