Charts Engine

ubethke
ubethke Registered Posts: 21 ✭✭✭✭
I have two questions around the chart engine:

- In the documentation it says "This allows you to perform visual analytics on very large data extracts". Can this be quantified in terms of GB or number of rows? Would it be unrealistic to aggregate low cardinality columns in a data set of 100M+ rows. Columnar compression should be able to handle this.

- If the underlying dataset changes in the source system, how can I make sure that the data stays in sync between source and DSS server?

Many thanks

Uli

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Hi Uli,

    The practical limitations of the builtin charts engine would rather be based on the time it takes to actually build the columnar cache, the required disk space for it, and the cardinality of the columns (it does not scale very on very-high cardinality columns)

    100M rows and low-cardinality columns should definitely be OK.

    The cache is automatically dropped when running on managed datasets (ie, datasets built by DSS). For source datasets, we have chosen not to try and autodetect changes in the underlying source, because it would be too expensive, so if the data source changes on an external dataset, you have to click on the "Refresh sample" button.
Setup Info
    Tags
      Help me…