DSS does not display correct types for my dataset
I checked the schema tab, types are correct there. But in the explore view, some types are wrong. For instance
- DSS says “integer”, when it should be string or double
- I have a column containing US-state abreviations. DSS says it has “Country meaning” and thus a few lines are red.
How do I tell it those are string instead?
Best Answer
-
I agree this is a subtle part of the studio. There are two distinct concepts akin to a type. You can see both when editing a preparation script:
Each column has “Meaning” and “Stored as”:
- The “Storage type” is the type from the dataset's schema and is used to store the datasets on disk, on the SQL server, etc. It is part of the dataset metadata.
- The “Meaning” is mostly informational. It is not stored, rather, it is auto-detected each time you explore a dataset. DSS simply reports what it thinks this column most likely is.
The meaning impacts the default processors suggested by DSS (but whatever the meaning, all processors remain available), the red/green gauge (number of lines which correspond to the detected meaning), and some filtering options on charts.
The Meaning is used to infer a storage type when DSS is asked to automatically create a schema, and this is the only case where it's required to click “Meaning→Change” (which adds a “change meaning” processor). Clicking “Meaning→Change” simply overrides the auto-detection.
Meanings are richer than storage type: they detect that a column might be an IP, a country, a gender, an email, a json object, etc.
See also http://doc.dataiku.com/dss/latest/preparation/data_prep_and_schemas.html