Dataset Schema Query

Level 1
Dataset Schema Query

Hi Folks,

I am actually working on plugins that generate datasets with varying number of columns. Because of this I am unable to manually define the schema of the output and I am currently using the automatic setting.

This however, leads to Dataiku setting the default length of a lot of varchar columns to 65535 which is huge when I need to deal with large volumes of data. I am looking for a way to limit the maximum size of columns dynamically.


Any leads would be appreciated.





0 Kudos
2 Replies


The default length with being according to the max for your database in your 65535.

To change this you can use the UI or get_schema() and set_schema() once the dataset si created and the initial schema is set.  To update maxLength in the schema via the API you can use :

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd

client = dataiku.api_client()
project = client.get_project('PREDICTION_CARS')
input_dataset = project.get_dataset('PREDICTION_CARS_1')

schema = input_dataset.get_schema()
new_schema = {'columns': [], 'userModified': True}

try :
    for col in schema['columns']:
        if col['type'] == 'string':
            col['maxLength'] = 3000
    print('final new schema')
    except Exception as e: print(e)
except Exception as e: print(e)


Let me know if this works for you. 

0 Kudos
Level 1

Thanks @AlexT , will try this out...

0 Kudos