Class rebalanceの動作

Koji · January 2024

Train/Testsetの Sampling methodでClass rebalanceを選択するとアンダーサンプリングがおこなわれているようですが、具体的にはどのような動作が行われているのでしょうか？クラスの割合を保持したまま全体をアンダーサンプリングされているのか、単純にアンダーサンプリングされているのでしょうか？

Tsuyoshi · January 2024

以下の記述がありますので、指定した%の範囲にて、両方のクラス数が同程度となるように、単純にアンダーサンプリングする動作となるはずです。

This method does not oversample, only undersample (so some rare modalities may remain under-represented). In all cases, rebalancing is approximative.

https://doc.dataiku.com/dss/latest/explore/sampling.html#class-rebalancing-approximate-ratio

その意味では、以下ドキュメントに記載のように、学習データの数が少ない場合、Class rebalancingではなくてweighting strategyのclass weightsを利用することを推奨しています。

Class weights can be substituted by a “Class rebalancing” sampling strategy settable in Settings: Train / Test set, which is recommended for larger datasets. For smaller datasets, i.e. when preprocessed data fits in memory, chosing the “class weights” weighting strategy is the recommended option.

https://doc.dataiku.com/dss/latest/machine-learning/supervised/settings.html#setting-weighting-strategy

Class rebalanceの動作

Comments

Categories

Setup Info

Tags