
Dae-Jin Lee
Assistant Professor
This study explores dataset complexity estimation and its relationship with classification performance. Widely-used measures like the k-Disagreeing Neighbors (kDN) are limited by their reliance on a fixed number of neighbors (k) and their discrete output, which can lead to coarse estimations. To address these issues, we introduce Dynamic Disagreeing Neighbors (DDN), a measure that incorporates dynamic, density-aware neighborhoods and distance-based weighting to provide a smoother, more flexible complexity estimation. We conduct a large-scale empirical study on 65 binary datasets, comparing DDN against kDN and other state-of-the-art measures, analyzing the impact of the k parameter and class imbalance on the alignment with classifier performance. Our results show that DDN provides a more robust and stable correspondence with performance. Furthermore, we demonstrate that in a predictive modeling task, a regression model using DDN as a feature is significantly more accurate at estimating dataset performance than models using other complexity measures. These findings not only deepen our understanding of complexity estimation but also pave the way for more informed classifier selection and data preprocessing strategies.

Assistant Professor
INQUIRY -