Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I'm facing an issue while processing an excel file. The error log is very difficult to understand.
I hope that someone faced this issue before and help me solve it.
[15:20:26] [DEBUG] [com.monitorjbl.xlsx.impl.StreamingWorkbookReader] - Deleting tmp file [/home/dataiku/dataiku/tmp/tmp-15330357160647418901.xlsx]
[15:20:26] [ERROR] [dku.input.push] - Push failed, cleanup resources
java.lang.NumberFormatException: For input string: "1e6"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.handleEvent(StreamingSheetReader.java:118)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.getRow(StreamingSheetReader.java:71)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.access$200(StreamingSheetReader.java:32)
at com.monitorjbl.xlsx.impl.StreamingSheetReader$StreamingRowIterator.hasNext(StreamingSheetReader.java:402)
at com.dataiku.dip.formats.excel.ExcelFormatExtractor.doExtractStream(ExcelFormatExtractor.java:148)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
at com.dataiku.dip.datasets.AbstractSingleThreadPusher.pushSplits(AbstractSingleThreadPusher.java:177)
at com.dataiku.dip.datasets.UniversalSingleThreadPusher.push(UniversalSingleThreadPusher.java:234)
at com.dataiku.dip.dataflow.exec.stream.SingleThreadFSLikeDatasetRunnable.run(SingleThreadFSLikeDatasetRunnable.java:71)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[15:20:26] [INFO] [dku.output.sql.pglike] - Aborting transaction
[15:20:26] [INFO] [dip.connection.share] - Give connection refCount=1
[15:20:26] [INFO] [dip.connection.share] - > closing connection with failure
[15:20:26] [DEBUG] [dku.connections.sql.provider] - Rollback conn=Dataiku_DB-bLHcHzb
[15:20:26] [DEBUG] [dku.connections.sql.provider] - Close conn=Dataiku_DB-bLHcHzb
[15:20:26] [DEBUG] [dku.resourceusage] - Reporting completion of CRU:{"context":{"type":"JOB_ACTIVITY","authIdentifier":"admin","projectKey":"POLICYDATA","jobId":"Build_OP01_New_prepared__NP__2021-12-10T08-17-38.256","activityId":"compute_OP01_New_prepared_NP","activityType":"recipe","recipeType":"shaker","recipeName":"compute_OP01_New_prepared"},"type":"SQL_CONNECTION","id":"sJbVBA5JxXDiClCt","startTime":1639124261558,"sqlConnection":{"connection":"Dataiku_DB"}}
[15:20:26] [INFO] [dku.flow.activity] - Run thread failed for activity compute_OP01_New_prepared_NP
java.lang.NumberFormatException: For input string: "1e6"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.handleEvent(StreamingSheetReader.java:118)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.getRow(StreamingSheetReader.java:71)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.access$200(StreamingSheetReader.java:32)
at com.monitorjbl.xlsx.impl.StreamingSheetReader$StreamingRowIterator.hasNext(StreamingSheetReader.java:402)
at com.dataiku.dip.formats.excel.ExcelFormatExtractor.doExtractStream(ExcelFormatExtractor.java:148)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
at com.dataiku.dip.datasets.AbstractSingleThreadPusher.pushSplits(AbstractSingleThreadPusher.java:177)
at com.dataiku.dip.datasets.UniversalSingleThreadPusher.push(UniversalSingleThreadPusher.java:234)
at com.dataiku.dip.dataflow.exec.stream.SingleThreadFSLikeDatasetRunnable.run(SingleThreadFSLikeDatasetRunnable.java:71)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[15:20:26] [INFO] [dku.flow.activity] running compute_OP01_New_prepared_NP - activity is finished
[15:20:26] [ERROR] [dku.flow.activity] running compute_OP01_New_prepared_NP - Activity failed
java.lang.NumberFormatException: For input string: "1e6"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.handleEvent(StreamingSheetReader.java:118)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.getRow(StreamingSheetReader.java:71)
at com.monitorjbl.xlsx.impl.StreamingSheetReader.access$200(StreamingSheetReader.java:32)
at com.monitorjbl.xlsx.impl.StreamingSheetReader$StreamingRowIterator.hasNext(StreamingSheetReader.java:402)
at com.dataiku.dip.formats.excel.ExcelFormatExtractor.doExtractStream(ExcelFormatExtractor.java:148)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
at com.dataiku.dip.datasets.AbstractSingleThreadPusher.pushSplits(AbstractSingleThreadPusher.java:177)
at com.dataiku.dip.datasets.UniversalSingleThreadPusher.push(UniversalSingleThreadPusher.java:234)
at com.dataiku.dip.dataflow.exec.stream.SingleThreadFSLikeDatasetRunnable.run(SingleThreadFSLikeDatasetRunnable.java:71)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[15:20:26] [INFO] [dku.flow.activity] running compute_OP01_New_prepared_NP - Executing default post-activity lifecycle hook
[15:20:26] [INFO] [dku.flow.activity] running compute_OP01_New_prepared_NP - Removing samples for POLICYDATA.OP01_New_prepared
[15:20:26] [INFO] [dku.flow.activity] running compute_OP01_New_prepared_NP - Done post-activity tasks
Operating system used: Ubuntu
Operating system used: Ubuntu
Data quality is not the problem. I tried to divide the file into 2 parts and it ran well