Update Dataset with Dash

rafaelmozo
rafaelmozo Registered Posts: 1
edited July 2024 in Using Dataiku

Hi there, I am relatively new to Dataiku, I have looked in the forum for a similar question, but I haven't found anything.

I am developping a webapp with Dash. I want the user to modify some values of a dataset, and these changes to be updated in the flow. Given on some inputs of the user, the webapp displays a dash_table.DataTable with some rows and some columns of the imported dataset. The user can modify some values, and then there's a button to confirm those changes. However, I don't manage to update these changes in the dataset. I don't know if the problem comes when transforming this modified data_table into the updated_dataset (taking attribute 'data' of Data_table which is a list of dicts, and transforming it into a pandas dataframe), or when saving the dataset:

dataset.write_dataframe(updated_dataset)

When running the webapp and making changes to the data_table, those changes don't affect the dataset in the flow, nor the datatable when I refresh I re-run the webapp.

Hope I have stated my problem clearly.

Thank you in advance!


Operating system used: Windows

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 320 Dataiker

    Hi @rafaelmozo
    ,

    Thank you for your Dash dataset question!

    So I think there are two components to this question:
    (1) ensure that the actual dataset is truly getting updated in the webapp (in this particular case it sounds like the underlying dataset indeed may not be getting updated, so we'll want to investigate why)
    (2) ensure that the Dash dataset is updated - this is more specific to the Dash webapp setup and I'll pass along an example for this part here.

    For #1, if you want to pass along your full webapp code we can take a look at what might be happening.

    For #2, in order to refresh your data automatically in a Dash webapp, you'll probably need to make use of dash callbacks to handle reloading your dataset at a specific interval.

    Here's a small example that I created based on the following stackoverflow response.

    Here is my webapp code that plots the simple histogram example dash app:

    import plotly.express as px
    import dataiku
    import dash_core_components as dcc
    import dash_html_components as html
    import pandas as pd
    import dash

    # the initial data read
    dataset = dataiku.Dataset("sample_data_prepared")
    df = dataset.get_dataframe()

    # the initial figure
    fig = px.bar(df, x="event_date", y="device_id")

    app.layout = html.Div(children=[
    html.H1(children='Hello Dash'),
    dcc.Interval('graph-update', interval = 600, n_intervals = 0),
    dcc.Graph(
    id='example-graph',
    figure=fig
    )
    ])

    # callback function
    @app.callback(
    dash.dependencies.Output('example-graph','figure'),
    [dash.dependencies.Input('graph-update', 'n_intervals')]
    )

    # updates the dataset every n milliseconds (set to 600 for the demo, probably should be set to every several minutes)
    def updateTable(n):
    dataset = dataiku.Dataset("sample_data_prepared")
    df = dataset.get_dataframe()
    fig = px.bar(df, x="event_date", y="device_id")
    return fig
    Here is a brief demo of the above, where I update the data through a prepare recipe and we can then see the data updated in the dash webapp automatically.


    Thanks,
    Sarina

Setup Info
    Tags
      Help me…