Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

How Dataiku store the dataset in the project flow?

p_phanwong
Level 2
How Dataiku store the dataset in the project flow?

Hi,

I've created 2 output datasets, the first one store in PostgreSQL and another in managed folder then I changed connection of the second one (the one in managed folder) to PostgreSQL. It turns out that -

  1. The first that directly store in PostgreSQL does not have a data size in storage.
  2. The second which store in managed folder after that change to PostgreSQL does have a data size in storage.

Where the first output dataset was stored? How does it work?

See reference image in attachment

Thank you

 
0 Kudos
1 Reply
AlexT
Dataiker

Hi,

The first dataset is local managed datasets thus it's stored on DSS instance we can calculate the size of the file on disk 

For SQL datasets these are stored in an SQL database we don't calculate the size as this can depend on various factors for SQL databases. You can find this out using specific queries directly to Postgres if needed and potentially add a custom SQL probe. 

Fo rSQL datasets you would typically want  to check other metrics like the number of rows available from the Status - Metrics on the dataset. 

Let me know if that helps!

0 Kudos