Error while reading MySQL table during the execution of a recipe
Hello,
I have a MySQL table with around 2,8 millions rows. I just created a recipe to parse a date+time column. This recipe usually works fine. But in that case, after reading around 890 000 rows, the job fails with the following error:
Failed to read data from table
Failed to read data from table, caused by: BatchUpdateException: Data truncation: Data too long for column 'text' at row 509, caused by: MysqlDataTruncation: Data truncation: Data too long for column 'text' at row 509
HTTP code: , type: com.dataiku.dip.exceptions.DataStoreIOException
Investigating a bit the log, I remarked that a column that is a "text" column in MySQL is transformed into a "varchar(500)" column. I am not sure that this is the source of the error (as it is a problem while reading the data and not writing it).
I have tried several things, including, in the original table to specify that there should be 1000 caracters max for instance, trying to change the "autodetect" (that gives "Natural Language" for this column) to "text". After those changes I asked DSS to "propagate the schema" from the original tables (it does no see any change anyway).
Have you any idea of the source of this error?
Best,
Frédéric
Answers
-
Hello,
I have the exact same problem, string recorded column (long URL) became varchar(500) in a left join job, so I am currenty investigating about that.
Seems like everything that is recorded as "string" becomes varchar(500).
Now i looked a little on internet, and it seems the following is applying to join jobs (quoting from another site) :
-> Just go with the TEXT type and skip reading all these answers below. In the end, that's what most of them suggest.
Of course, if You need indexing or uniqueness, go for VARCHAR, since TEXT cannot be indexed. This means that's probably the default behavior for Dataiku ?
In order to avoid problems about joins.So now, looking at lenght (quoting again another site) :
Lowest common denominator max URL length among popular web browsers: 2,083 (Internet Explorer)
http://dev.mysql.com/doc/refman/5.0/en/char.html
Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.So ...
< MySQL 5.0.3 use TEXT
or
>= MySQL 5.0.3 use VARCHAR(2083)
Testing with VARCHAR(2083) so.
-
Hello,
Little update about the try with LONG VARCHAR (65000+).
Well basically it still fails, Dataiku still try to convert / truncate into varchar(500), and it seems someone has the same problem here around : https://community.dataiku.com/t5/Using-Dataiku-DSS/Dataiku-recipes-don-t-keep-the-data-types-in-output-datasets/m-p/2013/dataiku-recipes-dont-keep-the-data-types-in-output-datasets
I am still not sure about how to solve that problem ...
Edit :
For now, exported to CSV in intenal managed dataset instead of mysql, even though it's a bad idea especially with that much data stored in. -
Hi,
You need to go to the settings of the output dataset, in the "Schema" tab. There, you can select your field that is too long and set the max length. Save the schema and re-run your recipe.
-
Well i can see "string" as column type, but can't specify the lenght ?
Maybe it's a type name that has to be set ? But which one ?In the job i see that :
"originalType": "MEDIUMTEXT", "meaning": "Text", "name": "referer", "type": "string", "maxLength": 500
But i am unable to find the maxlenght option to edit.
=> Edit finally found out :
- Go to Selected columns
- Near the name of the table u have a settings icon, click on it
- Select the column name that has the too long values let it be as string / auto detect, but tick the checkbox
- Click DETAILS on the right and there u see the Max Lenght : 500Well now, it's about global storage that is way too long for each row.
Error is the following :Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs