Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on October 16, 2024 2:42PM
Likes: 0
Replies: 2
I have a non-partitioned D1 dataset. The first column "dt_partition" will be used to partition the next dataset. dt_partition is of type integer, representing the month (for example 202409). It only contains one value at a time, so the data will go into a single partition. My database is Snowflake.
At the output of D1, there is a Sync recipe to a D2 dataset, partitioned on dt_partition.
"Partitioned by" = "dt_partition".
"Redispatch partitioning ..." is unchecked because there is only one partition at a time in D1.
Engine = "In-database (SQL)".
"Append instead of overwrite" is unchecked, because all the data for the month is in D1.
Run manually, the recipe works. However, when executed in a scenario, it produces the following error:
NumberFormatException: For input string: "dt_partition".
The scenario is composed of one Build step, of item D2.
"Build mode" = "Build just these items".
Using a Prepare recipe instead of Sync produce the same error in the scenario.
Operating system used: windows
The cause of the error was this parameter in Sync
Partitioned by = dt_partition
I thought that "dt_partition" was the name of the column that Dataiku use for partitionning, but it's actually the partition value. Dataiku is looking for the partition = "dt_partition" instead of 202409 for example. Hence the error of type string instead of numeric. I changed the setting like this and it works
Partitioned by = ${dt_partition}
Hi,
Is the input dataset file based or SQL?
If the input is file based dataset and is not partitioned and the output is partitioned typically, you would need to use redispatch. I
f the datasets are SQL datasets then it doesn't make sense to redispatch. You can simply create a partitioned input dataset on the dt_partition columns and then sync to the partitioned partitions you want.
https://knowledge.dataiku.com/latest/mlops-o16n/partitioning/concept-redispatch.html
If this doesn't help please share the job diagnostics from both manual run and run via scenario so we can further look into this,https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html
Thanks