Error while reading MySQL table during the execution of a recipe

Options
fclavert
fclavert Registered Posts: 8 ✭✭✭✭

Hello,

I have a MySQL table with around 2,8 millions rows. I just created a recipe to parse a date+time column. This recipe usually works fine. But in that case, after reading around 890 000 rows, the job fails with the following error:

Failed to read data from table

Failed to read data from table, caused by: BatchUpdateException: Data truncation: Data too long for column 'text' at row 509, caused by: MysqlDataTruncation: Data truncation: Data too long for column 'text' at row 509

HTTP code: , type: com.dataiku.dip.exceptions.DataStoreIOException

Investigating a bit the log, I remarked that a column that is a "text" column in MySQL is transformed into a "varchar(500)" column. I am not sure that this is the source of the error (as it is a problem while reading the data and not writing it).

I have tried several things, including, in the original table to specify that there should be 1000 caracters max for instance, trying to change the "autodetect" (that gives "Natural Language" for this column) to "text". After those changes I asked DSS to "propagate the schema" from the original tables (it does no see any change anyway).

Have you any idea of the source of this error?

Best,

Frédéric

Tagged:

Answers

  • Feldunoob
    Feldunoob Registered Posts: 11 ✭✭✭✭
    Options

    Hello,

    I have the exact same problem, string recorded column (long URL) became varchar(500) in a left join job, so I am currenty investigating about that.

    Seems like everything that is recorded as "string" becomes varchar(500).

    Now i looked a little on internet, and it seems the following is applying to join jobs (quoting from another site) :

    -> Just go with the TEXT type and skip reading all these answers below. In the end, that's what most of them suggest. Of course, if You need indexing or uniqueness, go for VARCHAR, since TEXT cannot be indexed.

    This means that's probably the default behavior for Dataiku ?
    In order to avoid problems about joins.

    So now, looking at lenght (quoting again another site) :

    1. Lowest common denominator max URL length among popular web browsers: 2,083 (Internet Explorer)

    2. http://dev.mysql.com/doc/refman/5.0/en/char.html
      Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

    3. So ...
      < MySQL 5.0.3 use TEXT
      or
      >= MySQL 5.0.3 use VARCHAR(2083)

    Testing with VARCHAR(2083) so.

  • Feldunoob
    Feldunoob Registered Posts: 11 ✭✭✭✭
    Options

    Hello,

    Little update about the try with LONG VARCHAR (65000+).

    Well basically it still fails, Dataiku still try to convert / truncate into varchar(500), and it seems someone has the same problem here around : https://community.dataiku.com/t5/Using-Dataiku-DSS/Dataiku-recipes-don-t-keep-the-data-types-in-output-datasets/m-p/2013/dataiku-recipes-dont-keep-the-data-types-in-output-datasets

    I am still not sure about how to solve that problem ...

    Edit :
    For now, exported to CSV in intenal managed dataset instead of mysql, even though it's a bad idea especially with that much data stored in.

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Options

    Hi,

    You need to go to the settings of the output dataset, in the "Schema" tab. There, you can select your field that is too long and set the max length. Save the schema and re-run your recipe.

  • Feldunoob
    Feldunoob Registered Posts: 11 ✭✭✭✭
    edited July 17
    Options

    Well i can see "string" as column type, but can't specify the lenght ?
    Maybe it's a type name that has to be set ? But which one ?

    In the job i see that :

                "originalType": "MEDIUMTEXT",
                "meaning": "Text",
                "name": "referer",
                "type": "string",
                "maxLength": 500

    But i am unable to find the maxlenght option to edit.

    => Edit finally found out :

    - Go to Selected columns
    - Near the name of the table u have a settings icon, click on it
    - Select the column name that has the too long values let it be as string / auto detect, but tick the checkbox
    - Click DETAILS on the right and there u see the Max Lenght : 500

    Well now, it's about global storage that is way too long for each row.
    Error is the following :

    Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs

Setup Info
    Tags
      Help me…