Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

Loading dataset from MySQL Dump

hakverir
Level 1
Loading dataset from MySQL Dump

I am trying to load Wikipedia's SQL dumps to Dataiku after downloading them. It almost always gives the

"Tried format mysql_dump but configuration is not OK: Invalid format: Missing 'table' argument in MySQL Dump format"

error, but sometimes works unexpectedly. And lastly, it gave this error:

"Tried format mysql_dump but configuration is not OK:"

And when I went back to open the dataset after this error, the data was there. However, I cannot analyze the data or operate on it since it gives errors like this:

[2022/12/19-10:03:01.673] [FRT-34-FlowRunnable] [INFO] [dku] - getCompression filename=**enwiki-latest-page_props.sql.gz**
[2022/12/19-10:03:01.675] [FRT-34-FlowRunnable] [INFO] [dku.format.mysql] - MySQL dump, starting to process one stream
[2022/12/19-10:03:07.582] [FRT-34-FlowRunnable] [ERROR] [dku.pipeline] - Parallel stream worker failed
java.io.IOException
	at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165)
	at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
	at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
	at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/12/19-10:03:07.584] [FRT-34-FlowRunnable] [INFO] [dku.pipeline] - done running
[2022/12/19-10:03:07.585] [FRT-34-FlowRunnable] [INFO] [dku.flow.activity] - Run thread failed for activity compute_WikipediaPages_Properties_NP
java.io.IOException
	at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165)
	at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
	at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
	at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/12/19-10:03:07.584] [FRT-35-FlowRunnable] [INFO] [dku.flow.stream] - Parallel streamer done
[2022/12/19-10:03:07.586] [FRT-35-FlowRunnable] [INFO] [dku.flow.activity] - Run thread done for activity compute_WikipediaPages_Properties_NP
[2022/12/19-10:03:07.738] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - activity is finished
[2022/12/19-10:03:07.739] [ActivityExecutor-29] [ERROR] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Activity failed
java.io.IOException
	at com.dataiku.dip.input.formats.MySQLDumpFormatExtractor.doExtractStream(MySQLDumpFormatExtractor.java:165)
	at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.extractSimple(ArchiveCapableFormatExtractor.java:154)
	at com.dataiku.dip.input.formats.ArchiveCapableFormatExtractor.run(ArchiveCapableFormatExtractor.java:59)
	at com.dataiku.dip.dataflow.exec.stream.ParallelStreamSlaveRunnable.run(ParallelStreamSlaveRunnable.java:61)
	at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2022/12/19-10:03:07.739] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Executing default post-activity lifecycle hook
[2022/12/19-10:03:07.741] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Removing samples for WIKIPEDIASYNC.WikipediaPages_Properties
[2022/12/19-10:03:07.743] [ActivityExecutor-29] [INFO] [dku.flow.activity] running compute_WikipediaPages_Properties_NP - Done post-activity tasks

Does anyone know why this happens and how I can solve it?


Operating system used: Windows

0 Kudos
1 Reply
JordanB
Dataiker

Hi @hakverir,

The MySQL dump format is very limited. I recommend loading the dump into a MySQL database and connecting to the MySQL database.
 
Thanks!
Jordan
0 Kudos