-
Missing ID in partitioned group by
Hi, I've got a partitioned dataset with IDs in one column. The dataset registers some transactions: it may well be that some IDs do not appear in all the transactions. I then want to group this dataset and sum the transaction column. Could you please confirm that when I do so, I'm not going to "lose" any of the IDs along…
-
validation failed: Cannot insert into target table because number/types are different.
Hi, I get this message from a hive recipe on a partitioned dataset stored on HDFS: validation failed: Cannot insert into target table because number/types are different "2018-02": Table inclause-0 has 27 columns, but query has 28 columns. my query is: SELECT * FROM MyTable
-
The partitioning column does not display in dataiku
Hi everybody, I an getting a real problem, when I import a table from hive , the partitioning column does not display in dataiku please any help ?
-
How to stack partitionned datasets with incoherent partitions?
Hi, Let's say we have 2 datasets, partitioned by letters : Dataset 1 : - partition A - partition B Dataset 2 : - partition B - partition C I would like to get a "summed" dataset, where existing partitions are stacked. Dataset 3 : - partition A (from 1) - partition B (from 1 + 2) - partition C (from 2) Simply stacking those…
-
How to remove partitioning?
One of my dataset is partitioned along two dimensions (source and date), but I'd like to retrieve a non-partitioned dataset from it. When I try to run a simple "Sync" recipe using "All available" partitions, the build fails systematically, probably because I don't have a source partition for each existing date. Is it…
-
Identify lines based on partition variable
Hi, I'm creating datasets based on files in a S3 bucket. The files in the bucket are in a single folder, but have several name patterns, such as "blue_01012017.csv", "red_02012017.csv", etc. Using partitioning, I have defined "blue", "red", etc. as a partition variable called "source". This information is not included in…
-
reading file after partitioning
Hello, I have a filesytem organized this way: /folder/YEAR/MONTH/DDHH I tried to partition at the DDHH level, with one folder per partition. Since it is not a 'regular' structure (such as %Y/%M/%DD/.*), I did the partitioning as %Y/%M/%{dimension_2}/.* and it outputs 718 partitions of 1 file (json) After this operation, I…
-
after partitioning , problem reading a json file
Hello, I have a filesytem organized this way: /folder/YEAR/MONTH/DDHH I tried to partition at the DDHH level, with one folder per partition. Since it is not a 'regular' structure (such as %Y/%M/%DD/.*), I did the partitioning as %Y/%M/%{dimension_2}/.* and it outputs 718 partitions of 1 file (json) After this operation, I…
-
partitionning, parallelization and projections with vertica
Hi, I use DSS v4.0.1. I have a CSV input dataset patitionned by year in files (/%Y_dataset_src), and a reciepe for preparing data into a vertica dataset (partionned among %Y in a column date). I need parallelization because this job is quite long (20h). The partionning is ok and the execution works well year by year. When…
-
Data prep and partition
Hi, How can I use partitionning variable substitution in a data preparation recipe (for example in a Formula step)? Thanks PS: I've already ask my question a while ago: http://answers.dataiku.com/159/data-prep-and-partition?show=159#q159