Submit your inspiring success story or innovative use case to the 2022 Dataiku Frontrunner Awards! ENTER YOUR SUBMISSION

Job taking long time to complete due to 'checking if table exists' step

farhanromli
Level 1
Job taking long time to complete due to 'checking if table exists' step
[2022/06/06-19:03:09.858] [qtp1290795133-42] [DEBUG] [dku.sql.generic]  - Checking if table exists by querying meta schema=null table=abc types=["TABLE","VIEW","CALC VIEW"]
[2022/06/06-19:07:45.159] [qtp1290795133-42] [INFO] [dku.sql.generic]  - Table null.abc exists

Recently, I noticed the above is causing my job duration to increase quite significantly. As seen above, it is taking almost 5 mins for it to find the table. From further investigation, this step only happens for certain tables and before this, it will only take less than a minute.

 

This issue does not cause the job to fail but only increase the processing time. Spark engine is used to run the job

I have been using Dataiku for almost a year but only started to observe this since last month.

Appreciate the help.

Note= 'abc' is used to replace actual table name


Operating system used: Windows 10

0 Kudos
1 Reply
AlexT
Dataiker
Dataiker

Hi @farhanromli ,

A few questions that may help narrow down the issue:

1) Is the issue intermittent?

2) Does happens with specific databases only?

3) What Hadoop distro version are you on? 

DSS will check for table existence as part of dependency checks before running a job. 

What that call actually does is send get_tables call to the Hive metastore and wait's for a response:

The Hive logs it would like: OperationHandle [opType=GET_TABLES]

If you are seeing degraded performance all of a sudden your database may have reached a  very high number of tables and this specific operation is now taking longer or there is an issue with the hive metastore you may need to look further into.

0 Kudos