I have an imbalanced dataset (most of instances are in Class-1, and only 1% of instances are labeled with Class-2). When i run my classification model (decision tree, logist regress) using ROC AUC as an accuracy measure, i receive an excellent accuracy score but very low Precision. This happens due to the imbalanced training set. How do you handle it in DataIku environment? I am new to dataiku, and still learning my ways around it.
Is there an option for SMOTE supervised filter?
Note that when doing a prediction with DSS visual ML, in most algorithms, DSS will use scikit-learn's capabilities for rebalancing by setting appropriate instance weights.
In addition to this, you can use our built-in class rebalancing sampling capability: https://doc.dataiku.com/dss/latest/explore/sampling.html#class-rebalancing-approximate-number-of-rec...
However, note that such rebalancing means your performance metrics may not be representative anymore of a real-world distribution.
Alternatively, you can implement SMOTE as a preliminary Python or R recipe.
Hope it helps,