Efficient Data Cleaning Techniques in Dataiku?
Hi all,
How do you handle missing values and outliers in Dataiku? Any plugins or workflows you'd recommend for efficient data cleaning?
Thanks for your tips!
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi @Miasm1
,1) For missing values -> https://doc.dataiku.com/dss/latest/machine-learning/features-handling/index.html#features-handling you can handle as part of feature handling.
https://knowledge.dataiku.com/latest/ml-analytics/model-design/concept-feature-handling.html#handle-missing-values
2) For outlier on clustering models see : https://doc.dataiku.com/dss/latest/machine-learning/unsupervised/settings.html#outliers-detection
You can use code -> https://developer.dataiku.com/latest/tutorials/plugins/recipes-clipping-dataset/index.htmlOr prepare recipe
https://doc.dataiku.com/dss/latest/preparation/processors/number-clipping.html -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,984 Neuron
@AlexT
It's a bot.