Join us, on May 27th, for an introduction to the new Dataiku Academy Learn more

Error while reading MySQL table during the execution of a recipe

Level 1
Error while reading MySQL table during the execution of a recipe

Hello,



I have a MySQL table with around 2,8 millions rows. I just created a recipe to parse a date+time column. This recipe usually works fine. But in that case, after reading around 890 000 rows, the job fails with the following error:



Failed to read data from table



Failed to read data from table, caused by: BatchUpdateException: Data truncation: Data too long for column 'text' at row 509, caused by: MysqlDataTruncation: Data truncation: Data too long for column 'text' at row 509



HTTP code: , type: com.dataiku.dip.exceptions.DataStoreIOException



Investigating a bit the log, I remarked that a column that is a "text" column in MySQL is transformed into a "varchar(500)" column. I am not sure that this is the source of the error (as it is a problem while reading the data and not writing it).



I have tried several things, including, in the original table to specify that there should be 1000 caracters max for instance, trying to change the "autodetect" (that gives "Natural Language" for this column) to "text". After those changes I asked DSS to "propagate the schema" from the original tables (it does no see any change anyway).



Have you any idea of the source of this error? 



Best,



Frédéric 

0 Kudos
4 Replies
Level 2

Hello,

I have the exact same problem, string recorded column (long URL) became varchar(500) in a left join job, so I am currenty investigating about that.

Seems like everything that is recorded as "string" becomes varchar(500).

 

Now i looked a little on internet, and it seems the following is applying to join jobs (quoting from another site) :

-> Just go with the TEXT type and skip reading all these answers below. In the end, that's what most of them suggest. 🙂 Of course, if You need indexing or uniqueness, go for VARCHAR, since TEXT cannot be indexed.

 

This means that's probably the default behavior for Dataiku ?
In order to avoid problems about joins.

 

So now, looking at lenght (quoting again another site) :

  1. Lowest common denominator max URL length among popular web browsers: 2,083 (Internet Explorer)

  2. http://dev.mysql.com/doc/refman/5.0/en/char.html
    Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

  3. So ...
    < MySQL 5.0.3 use TEXT
    or
    >= MySQL 5.0.3 use VARCHAR(2083)

 

Testing with VARCHAR(2083) so.

 

Level 2

Hello,

Little update about the try with LONG VARCHAR (65000+).

Well basically it still fails, Dataiku still try to convert / truncate into varchar(500), and it seems someone has the same problem here around : https://community.dataiku.com/t5/Using-Dataiku-DSS/Dataiku-recipes-don-t-keep-the-data-types-in-outp...

I am still not sure about how to solve that problem ...

 

Edit :
For now, exported to CSV in intenal managed dataset instead of mysql, even though it's a bad idea especially with that much data stored in.

0 Kudos
Dataiker
Dataiker

Hi,

You need to go to the settings of the output dataset, in the "Schema" tab. There, you can select your field that is too long and set the max length. Save the schema and re-run your recipe.

Level 2

Well i can see "string" as column type, but can't specify the lenght ?
Maybe it's a type name that has to be set ? But which one ?

 

In the job i see that :

            "originalType": "MEDIUMTEXT",
            "meaning": "Text",
            "name": "referer",
            "type": "string",
            "maxLength": 500

 

But i am unable to find the maxlenght option to edit.

 

=> Edit finally found out :

- Go to Selected columns
- Near the name of the table u have a settings icon, click on it
- Select the column name that has the too long values let it be as string / auto detect, but tick the checkbox
- Click DETAILS on the right and there u see the Max Lenght : 500

 

Well now, it's about global storage that is way too long for each row.
Error is the following :

Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs

0 Kudos
Labels (2)