Imported MySQL dataset - schema incomplete

cuezumo · July 2022

Hi folks,

I'm importing data from MongoDB connection into Dataiku using MongoDB's BI Connector. So far this has worked fine, I've never had any problems. However, the schema of this MySQL dataset is incomplete, I'm missing one important column.

There's not many settings that I can change or experiment with. In the MongoDB the schema is complete. Does anybody have a hint how to solve this?

Thanks!

cuezumo

Alexandru · July 2022

Hi,

The schema is determined by testing the collection and only the fields are available in the first rows returned from the collection. DSS does not scan the whole collection to determine the schema.

To get around this behavior you have a few options:

1) Filter if that column exists, if this is important column perhaps you only want data containing this column.
You can later remove this filter if you actually want all rows. Once this is added and you Schema - Check data you should be able to pull in the new schema.

You can use https://www.mongodb.com/docs/manual/reference/operator/query/exists/:

Example key3 would the column you want to filter for.

Screenshot 2022-07-18 at 21.28.55.png

If you remove the filter later you will see a schema mismatch warning which you can ignore.

2) Another workaround, is from the schema interface (using "Add column") of the dataset.

Screenshot 2022-07-18 at 21.37.11.png

3) You could also do it via code using the `write_schema()` method on a dataset, like:

dataset = dataiku.Dataset("dataset_name") # replace with your dataset name
dataset.write_schema([{"type": "string", "name": "column1"}, {"type": "string", "name": "column2"}, ...]) # replace ... by other columns

You will have a warning in the Schema tab if you check the consistency. So use that dataset and the modified schema with caution (see https://doc.dataiku.com/dss/latest/python-api/datasets-reference.html#dataiku.Dataset.write_schema).

Let me know if that helps.

cuezumo · July 2022

Big, big thanks, @AlexT
, that worked perfectly!

Thanks for the quick help.

Best, cuezumo

Imported MySQL dataset - schema incomplete

Best Answer

Answers

Categories

Setup Info

Tags