Reduce duration for Detecting Column Type
sbr
Registered Posts: 5 ✭✭✭✭
Each time I want to explore a Dataset, Dataiku is autodetecting each column, if I have 1500 columns, and a 30.000 sample, it is during quite long, and didn't make a lot of sense ?
If the Dataset has already a Schema, wouldn't be able to use that Schema, and only redetect columns on user action ?
Is there a way to prevent Dataiku autodetecting Columns ?
If the Dataset has already a Schema, wouldn't be able to use that Schema, and only redetect columns on user action ?
Is there a way to prevent Dataiku autodetecting Columns ?
Tagged:
Answers
-
Hi,
It make sense because for each columns we are trying to find the best preprocessing depending of the type and the statistics of the feature.
Did you do some features engineerings steps like creating dummy variables? If so, you shouldn't because DSS will do it for you and you don't want to store all the dummies dataframe on your disk.
Matt