Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

NumberFormatException: For input string: Data type wrongly interpreted on import from Excel

TrevorM
Level 1
NumberFormatException: For input string: Data type wrongly interpreted on import from Excel

I created a dataset by importing an Excel file. The data type was incorrectly identified as integer because the first few records happened to contain data that could be identified as integer. When I change the data type to string DSS returns error "NumberFormatException: For input string:" when I try to explore the data.

I have tried deleting the data and reloading the Excel file but this doesn't work.
The Excel file is small i.e. less than 1,000 rows and 22 columns

How do I correct this?
Can I force DSS to revaluate the data?

0 Kudos
3 Replies
Turribeach

When you say "When I change the data type to string" can you please clarify where exactly did you change (which screen/tab/field/etc)? In general you should only change data types in a Prepare recipe or in a Code Recipe where you have control on how to deal with exceptions. 

0 Kudos
TrevorM
Level 1
Author

I think I have found the cause of the error. I have followed these steps
1. + Dataset
2. Upload a file
3. Select the Excel file. The first 5 rows have data that could be integers, however row 6 has string data but DSS appears to identify the data as integer.
4. DSS Auto-detect identified the data as integer and I incorrectly saved and proceeded to build recipes and so.
4. This is a beginner's mistake. What I should have done is to go to the schema tab and change the data type.

How do I go about correcting this?
I am unable to overwrite the dataset and changing the data type in the columns view of the dataset doesn't have any effect.

0 Kudos
MayeulR
Dataiker

Hello,

Just to be sure to understand, when you say DSS identifies the data as integer, do you mean (1) the storage type or (2) the meaning? (In the explore view, under the name of the column, the first type is the storage type and the second one is the meaning). 

Best,

Mayeul 

0 Kudos