Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am trying to load Wikipedia's SQL dumps to Dataiku after downloading them. It almost always gives the
"Tried format mysql_dump but configuration is not OK: Invalid format: Missing 'table' argument in MySQL Dump format"
error, but sometimes works unexpectedly. And lastly, it gave this error:
"Tried format mysql_dump but configuration is not OK:"
And when I went back to open the dataset after this error, the data was there. However, I cannot analyze the data or operate on it since it gives errors like this:
[2022/12/19-10:03:01.673] [FRT-34-FlowRunnable] [INFO] [dku] - getCompression filename=**enwiki-latest-page_props.sql.gz**
[2022/12/19-10:03:01.675] [FRT-34-FlowRunnable] [INFO] [dku.format.mysql] - MySQL dump, starting to process one stream
[2022/12/19-10:03:07.582] [FRT-34-FlowRunnable] [ERROR] [dku.pipeline] - Parallel stream worker failed
java.io.IOException
at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/12/19-10:03:07.584] [FRT-34-FlowRunnable] [INFO] [dku.pipeline] - done running
[2022/12/19-10:03:07.585] [FRT-34-FlowRunnable] [INFO] [dku.flow.activity] - Run thread failed for activity compute_WikipediaPages_Properties_NP
java.io.IOException
at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/12/19-10:03:07.584] [FRT-35-FlowRunnable] [INFO] [dku.flow.stream] - Parallel streamer done
[2022/12/19-10:03:07.586] [FRT-35-FlowRunnable] [INFO] [dku.flow.activity] - Run thread done for activity compute_WikipediaPages_Properties_NP
[2022/12/19-10:03:07.738] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - activity is finished
[2022/12/19-10:03:07.739] [ActivityExecutor-29] [ERROR] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Activity failed
java.io.IOException
at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/12/19-10:03:07.739] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Executing default post-activity lifecycle hook
[2022/12/19-10:03:07.741] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Removing samples for WIKIPEDIASYNC.WikipediaPages_Properties
[2022/12/19-10:03:07.743] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Done post-activity tasks
Does anyone know why this happens and how I can solve it?
Operating system used: Windows
Hi @hakverir,