filter doesn't return any row when it's suppose to do
I'm traying to filter a table of 2.7M rows in order to have a sample .
Here what I did :
- I create a filter
- I chose : Filter ON
-Keep only rows that satisfy : All the following conditions
- I put the condition
- For the sampling : I chose Whole data
When I run ; my filter doesn't return any row when it's suppose to do
What is the problem ???
Thanks in advance
Best Answer
-
Hello,
Thanks for the diagnosis. After investigation, it seems the issue was caused by a discrepancy between lowercase and uppercase in your original Parquet file versus the Hive table. Your input dataset was generated as a Parquet file manually with the column name "MANDT" (uppercase). Then it was imported from Hive to DSS. However, Hive always converts all column names to lowercase. Hence, DSS was seeing the column name as "mandt" which is incoherent to the name stored in the original Parquet file. As of today we cannot detect this type of cases automatically.
The preferred solution would be to only generate Parquet files with lowercase column names, so that they are compatible with Hive (and Impala as well).
If that option is not possible, you may try to change the recipe engine from DSS to Hive. As a matter of fact, for large datasets, it is recommended to change the recipe engine to a Hadoop related one (Spark, Hive or Impala). You should gain in performance by pushing the computation down to your Hadoop cluster instead of having it streamed to DSS.
Cheers,
Alex
Answers
-
Hello,
Could you please give us more details as to the nature and content of your filter? Have you checked that the value you are filtering on is indeed in the whole dataset?
Cheers,
Alex -
Thank you alexandre for your reply,
I made multiple examples and no one works.
For example : I selected a column named mandt, then I tried mandt equals 100 ( all the rows have the value 100 ) -> result : 0 row
I tried mandt is different from 100 -> result : 0 row
mandt is defined -> result : 0 row
etc ....
:-( -
Could you tell us how you are creating your filter? Is it a filter in the sampling definition? A step in a recipe? A filter in the view of the sample?
You can attach some screenshots to your comments to show us what you are trying to achieve. -
-
Could you please share the job diagnosis after you run the recipe? You can download it in the page of the job, under Actions > Download job diagnosis.
-
Sorry I downloaded the job diagnosis but I don't know how to share it
-
You can use any file transfer you want, for instance Wetransfer.
-
Unfortunately I can't use these websites, they are all blocked in the company where I'm doing my internship
-
Can you send it as an email attachment to my address (alexandre.combessie@dataiku.com)? Side-note: if your company has subscribed to Dataiku, you can also contact our official support https://support.dataiku.com. The website answers.dataiku.com is meant for community support.
-
Hello Alex,
Thank you so much for your help , I appreciate it !