How Dataiku store the dataset in the project flow?

p_phanwong
p_phanwong Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 ✭✭✭

Hi,

I've created 2 output datasets, the first one store in PostgreSQL and another in managed folder then I changed connection of the second one (the one in managed folder) to PostgreSQL. It turns out that -

  1. The first that directly store in PostgreSQL does not have a data size in storage.
  2. The second which store in managed folder after that change to PostgreSQL does have a data size in storage.

Where the first output dataset was stored? How does it work?

See reference image in attachment

Thank you

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker

    Hi,

    The first dataset is local managed datasets thus it's stored on DSS instance we can calculate the size of the file on disk

    For SQL datasets these are stored in an SQL database we don't calculate the size as this can depend on various factors for SQL databases. You can find this out using specific queries directly to Postgres if needed and potentially add a custom SQL probe.

    Fo rSQL datasets you would typically want to check other metrics like the number of rows available from the Status - Metrics on the dataset.

    Let me know if that helps!

Setup Info
    Tags
      Help me…