Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I'm trying to overwrite a table using data from another table (with the same schema). I keep running into the issue that both datasets are partitioned and the writer does not like that (same case with the copy_to function).
Here is what I'm trying to do:
# Set the datasets
dataset_target = dataiku.Dataset(dataset, project_key=project_key_target)
dataset_source = dataiku.Dataset(dataset, project_key=project_key_source)
# Overwrite target dataset with source data
with dataset_target.get_writer() as writer:
for p in dataset_source.list_partitions():
dataset_source.read_partitions = [p]
df = dataset_source.get_dataframe()
dataset_target.set_write_partition(str(p))
writer.write_dataframe(df)
writer.close()
I'm getting this error, even though I would think the writer has a partition because of the "set_write_partition":
ERROR:dataiku.core.dataset_write:Exception caught while writing Traceback (most recent call last): File "/data/dataiku/install/dataiku-dss-12.3.1/python/dataiku/core/dataset_write.py", line 353, in run self.streaming_api.wait_write_session(self.session_id) File "/data/dataiku/install/dataiku-dss-12.3.1/python/dataiku/core/dataset_write.py", line 296, in wait_write_session raise Exception(u'An error occurred during dataset write (%s): %s' % (id, decoded_resp["message"])) Exception: An error occurred during dataset write (D9uuBrAH9P): RuntimeException: A partition ID must be provided, because the dataset myproject.target_table is partitioned
Does anyone know how I could resolve this? I also though about subverting the issue by removing the partitioning from the datasets altogether and replacing them after the copy but can imagine more going wrong there so would like to avoid that if possible.
Any help is appreciated, thanks in advance!
Figured it out!
The writer needs to be given the partition before being defined. This means we can solve it like this:
# Set the datasets
dataset_target = dataiku.Dataset(dataset, project_key=project_key_target)
dataset_source = dataiku.Dataset(dataset, project_key=project_key_source)
# Overwrite target dataset with source data
for p in dataset_source.list_partitions():
dataset_source.read_partitions = [p]
df = dataset_source.get_dataframe()
dataset_target.set_write_partition(str(p))
writer = dataset.target.get_writer()
writer.write_dataframe(df)
writer.close()
Figured it out!
The writer needs to be given the partition before being defined. This means we can solve it like this:
# Set the datasets
dataset_target = dataiku.Dataset(dataset, project_key=project_key_target)
dataset_source = dataiku.Dataset(dataset, project_key=project_key_source)
# Overwrite target dataset with source data
for p in dataset_source.list_partitions():
dataset_source.read_partitions = [p]
df = dataset_source.get_dataframe()
dataset_target.set_write_partition(str(p))
writer = dataset.target.get_writer()
writer.write_dataframe(df)
writer.close()