Loading dataset from MySQL Dump
hakverir
Registered Posts: 2 ✭
I am trying to load Wikipedia's SQL dumps to Dataiku after downloading them. It almost always gives the
"Tried format mysql_dump but configuration is not OK: Invalid format: Missing 'table' argument in MySQL Dump format"
error, but sometimes works unexpectedly. And lastly, it gave this error:
"Tried format mysql_dump but configuration is not OK:"
And when I went back to open the dataset after this error, the data was there. However, I cannot analyze the data or operate on it since it gives errors like this:
[2022/12/19-10:03:01.673] [FRT-34-FlowRunnable] [INFO] [dku] - getCompression filename=**enwiki-latest-page_props.sql.gz** [2022/12/19-10:03:01.675] [FRT-34-FlowRunnable] [INFO] [dku.format.mysql] - MySQL dump, starting to process one stream [2022/12/19-10:03:07.582] [FRT-34-FlowRunnable] [ERROR] [dku.pipeline] - Parallel stream worker failed java.io.IOException at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165) at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154) at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59) at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2022/12/19-10:03:07.584] [FRT-34-FlowRunnable] [INFO] [dku.pipeline] - done running [2022/12/19-10:03:07.585] [FRT-34-FlowRunnable] [INFO] [dku.flow.activity] - Run thread failed for activity compute_WikipediaPages_Properties_NP java.io.IOException at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165) at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154) at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59) at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2022/12/19-10:03:07.584] [FRT-35-FlowRunnable] [INFO] [dku.flow.stream] - Parallel streamer done [2022/12/19-10:03:07.586] [FRT-35-FlowRunnable] [INFO] [dku.flow.activity] - Run thread done for activity compute_WikipediaPages_Properties_NP [2022/12/19-10:03:07.738] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - activity is finished [2022/12/19-10:03:07.739] [ActivityExecutor-29] [ERROR] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Activity failed java.io.IOException at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165) at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154) at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59) at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61) at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374) [2022/12/19-10:03:07.739] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Executing default post-activity lifecycle hook [2022/12/19-10:03:07.741] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Removing samples for WIKIPEDIASYNC.WikipediaPages_Properties [2022/12/19-10:03:07.743] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Done post-activity tasks
Does anyone know why this happens and how I can solve it?
Operating system used: Windows
Tagged: