filter doesn't return any row when it's suppose to do

Options
boumezrag
boumezrag Registered Posts: 15 ✭✭✭✭
Hi everybody,

I'm traying to filter a table of 2.7M rows in order to have a sample .

Here what I did :

- I create a filter

- I chose : Filter ON

-Keep only rows that satisfy : All the following conditions

- I put the condition

- For the sampling : I chose Whole data

When I run ; my filter doesn't return any row when it's suppose to do

What is the problem ???

Thanks in advance

Best Answer

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Answer ✓
    Options

    Hello,

    Thanks for the diagnosis. After investigation, it seems the issue was caused by a discrepancy between lowercase and uppercase in your original Parquet file versus the Hive table. Your input dataset was generated as a Parquet file manually with the column name "MANDT" (uppercase). Then it was imported from Hive to DSS. However, Hive always converts all column names to lowercase. Hence, DSS was seeing the column name as "mandt" which is incoherent to the name stored in the original Parquet file. As of today we cannot detect this type of cases automatically.

    The preferred solution would be to only generate Parquet files with lowercase column names, so that they are compatible with Hive (and Impala as well).

    If that option is not possible, you may try to change the recipe engine from DSS to Hive. As a matter of fact, for large datasets, it is recommended to change the recipe engine to a Hadoop related one (Spark, Hive or Impala). You should gain in performance by pushing the computation down to your Hadoop cluster instead of having it streamed to DSS.

    Cheers,

    Alex

Answers

Setup Info
    Tags
      Help me…