Job taking long time to complete due to 'checking if table exists' step
[2022/06/06-19:03:09.858] [qtp1290795133-42] [DEBUG] [dku.sql.generic] - Checking if table exists by querying meta schema=null table=abc types=["TABLE","VIEW","CALC VIEW"] [2022/06/06-19:07:45.159] [qtp1290795133-42] [INFO] [dku.sql.generic] - Table null.abc exists
Recently, I noticed the above is causing my job duration to increase quite significantly. As seen above, it is taking almost 5 mins for it to find the table. From further investigation, this step only happens for certain tables and before this, it will only take less than a minute.
This issue does not cause the job to fail but only increase the processing time. Spark engine is used to run the job
I have been using Dataiku for almost a year but only started to observe this since last month.
Appreciate the help.
Note= 'abc' is used to replace actual table name
Operating system used: Windows 10
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @farhanromli
,A few questions that may help narrow down the issue:
1) Is the issue intermittent?
2) Does happens with specific databases only?
3) What Hadoop distro version are you on?
DSS will check for table existence as part of dependency checks before running a job.
What that call actually does is send get_tables call to the Hive metastore and wait's for a response:
The Hive logs it would like: OperationHandle [opType=GET_TABLES]
If you are seeing degraded performance all of a sudden your database may have reached a very high number of tables and this specific operation is now taking longer or there is an issue with the hive metastore you may need to look further into.
-
Hi
I have actually find a workaround for this.
When we create a table, we have options either to "Read a database table" or use "SQL query".
It seems this issue only happen if I go with the former option. But if I use SQL query, it is able to generate the meta schema.