Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
What is the computation setting for your metrics? You can find it in Dataset > Status > Edit > Edit computation settings (see below):
If your dataset comes from HDFS (your case), I advise selecting only the Hive or Impala engine (check with your Hadoop admin if Impala is installed). Note that Impala should be way faster than Hive.
If your dataset came from a regular filesystem, then indeed the only way for DSS to compute metrics like count of records is to stream the entire file, which can take time.