Working Overwrite dataset:
nv
Registered Posts: 11 ✭✭✭✭
Situation:
Dataset is partitioned by Year - month - day on HDFS.
Existing data: year=2016/month=05/
day=01
day=02
day=03
...
day=12
Questions:
- If I rebuild a dataset on 2016-05-12. Is only the data on the path year=2016/month=05/day=12 overwritten? Or Will all the datasets under the folder year=2016/... be overwritten?
- If I build a dataset on 2016-05-13. Is only the data written on the path year=2016/month=05/day=13 and all data remains unchanged (so not overwritten)? Or Will all the datasets under the folder year=2016/... be recalculated?
Best Answer
-
Hi,
The answer depends on the type of recipe you're using.
If it's an sql query:
- In both cases, only the selected partition will be written/overwritten
If it's an sql script:
- It entirely depends on what you do. Everything is possible, you're responsible for delete/write the good partition.
see http://doc.dataiku.com/dss/latest/partitions/sql_recipes.html?highlight=sql script