abstract |
The present invention propose towards higher-dimension and unbalanced data classify it is integrated, it is characterised in that using dimensionality reduction and the sequencing of sampling, pretreatment strategy is reduced to two classes;Reproducibility principle based on experiment conclusion, some standard data sets for choosing data mining and machine learning are used as experimental data;In the selection of preprocess method, packaged type (Wrapper) feature selection approach and over sampling method are added;The influence of dependence number and the uneven aspect of degree two research preprocess method to higher-dimension unbalanced data classification performance, using more complete Pretreatment Test strategy, obtains different conclusions:Before classifying to higher-dimension unbalanced data, the average AUC performances for first reducing the generation of feature releveling data are more excellent, and automaticity is strong, and higher-dimension and the uneven influence to classification are relaxed using different pretreatment combination size strategies. |