"NumberFormatException: For input string" in scenario with integer partitioned dataset
I have a non-partitioned D1 dataset. The first column "dt_partition" will be used to partition the next dataset. dt_partition is of type integer, representing the month (for example 202409). It only contains one value at a time, so the data will go into a single partition. My database is Snowflake.
At the output of D1, there is a Sync recipe to a D2 dataset, partitioned on dt_partition.
"Partitioned by" = "dt_partition".
"Redispatch partitioning ..." is unchecked because there is only one partition at a time in D1.
Engine = "In-database (SQL)".
"Append instead of overwrite" is unchecked, because all the data for the month is in D1.
Run manually, the recipe works. However, when executed in a scenario, it produces the following error:
NumberFormatException: For input string: "dt_partition".
The scenario is composed of one Build step, of item D2.
"Build mode" = "Build just these items".
Using a Prepare recipe instead of Sync produce the same error in the scenario.
Operating system used: windows
Best Answer
-
The cause of the error was this parameter in Sync
Partitioned by = dt_partition
I thought that "dt_partition" was the name of the column that Dataiku use for partitionning, but it's actually the partition value. Dataiku is looking for the partition = "dt_partition" instead of 202409 for example. Hence the error of type string instead of numeric. I changed the setting like this and it works
Partitioned by = ${dt_partition}
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Is the input dataset file based or SQL?
If the input is file based dataset and is not partitioned and the output is partitioned typically, you would need to use redispatch. I
f the datasets are SQL datasets then it doesn't make sense to redispatch. You can simply create a partitioned input dataset on the dt_partition columns and then sync to the partitioned partitions you want.
https://knowledge.dataiku.com/latest/mlops-o16n/partitioning/concept-redispatch.html
If this doesn't help please share the job diagnostics from both manual run and run via scenario so we can further look into this,https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html
Thanks