Estimation of the complexity of constructing a logical classification tree for an arbitrary case in conditions of strong class separation of the initial training sample

DOI: 10.31673/2412-4338.2020.035566

Authors

  • І. Ф. Повхан, (Povkhan I. F.) Uzhhorod National University, Uzhhorod

Abstract

The paper offers an estimation of the complexity of the constructed logical tree structure for classifying an arbitrary case in the conditions of a strong class division of the initial training sample. The principal solution to this question is of a defining nature, regarding the assessment of the structural complexity of classification models (in the form of tree-like structures of LCT/ACT) of discrete objects for a wide range of applied classification and recognition problems in terms of developing promising schemes and methods for their final optimization (minimization) of post-pruning structure. The presented research is relevant not only for constructions (structures) of logical classification trees, but also allows us to extend the scheme of complexity estimation to the General case of algorithmic structures (ACT models) of classification trees (the concept of algorithm trees and trees of generalized features - TGF). Is investigated the actual question of the concept of decision trees (tree recognition) – evaluation of the maximum complexity of the General scheme of constructing a logical tree based classification procedure of stepwise selection of sets of elementary features (they can be diverse sets and combinations) that for given initial training sample (array of discrete information) builds a tree structure (classification model), from a set of elementary features (basic attributes) are estimated at each stage of the scheme of the model in this sample for the case of strong separation of classes.
Modern information systems and technologies based on mathematical approaches (models) of pattern recognition (structures of logical and algorithmic classification trees) are widely used in socio-economic, environmental and other systems of primary analysis and processing of large amounts of information, and this is due to the fact that this approach allows you to eliminate a set of existing disadvantages of well-known classical methods, schemes and achieve a fundamentally new result. The research is devoted to the problems of classification tree models (decision trees), and offers an assessment of the complexity of logical tree structures (classification tree models), which consist of selected and ranked sets of elementary features (individual features and their combinations) built on the basis of the General concept of branched feature selection. This method, when forming the current vertex of the logical tree (node), provides the selection of the most informative (qualitative) elementary features from the source set. This approach allows you to significantly reduce the size and complexity of the tree (the total number of branches and tiers of the structure) and improve the quality of its subsequent instrumental analysis (the final decomposition of the model).

Keywords: logical classification tree, training sample, pattern recognition, classification, discrete attribute, classification scheme.

References
1. Povhan I. (2019) Generation of elementary signs in the general scheme of the recognition system based on the logical tree. Electronics and information technologies. Lviv, 2019, Vol. 12. P. 20-29.
2. Povhan I. (2020) Question of the optimality criterion of a regular logical tree based on the concept of similarity. Electronics and information technologies. Lviv, 2020, Vol. 13. P. 19-27.
3. Povkhan, I.F. (2018) The problem of functional evaluation of the training sample in the problems of recognition of discrete objects. Scientific notes of Taurida national University. Series: technical Sciences, Volume 29(68) №.6, 217-222.
4. Vasilenko, Y.A., Vasilenko, E.J., Povkhan, I.F., Kovacs, M.J., Nickovic, O.D. (2004) Minimization of logic tree structures in pattern recognition problems. Scientific and technical journal “European Journal of Enterprise Technologies”, 3[9], 12-16.
5. Alpaydin E. (2010). Introduction to Machine Learning. London: The MIT Press, 400 p.
6. Painsky A., Rosset S. (2017). Cross-validated variable selection in tree-based methods improves predictive performance. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 39, № 11. P. 2142-2153.
7. Srikant R., Agrawal R. (1997) Mining generalized association rules. Future Generation Computer Systems, Vol.13, №2, 61-180.
8. Vasilenko Y.A., Vasilenko E.Y., Povkhan I.F, Vashchuk F.G. (2004) Conceptual basis of pattern recognition systems based on the method of branched feature selection. Scientific and technical journal “European Journal of Enterprise Technologies”, №7[1], 13-15.
9. Vasilenko Y.A., Vashchuk F.G, Povkhan I.F. (2011) The problem of estimating the complexity of the logic trees, recognition, and a general method of optimization. Scientific and technical journal “European Journal of Enterprise Technologies”, 6/4(54), 24-28.
10. Vasilenko Y. A., Povkhan I.F., Vashchuk F.G. (2012) General estimation of tree logical structures minimization. Scientific and technical journal “European Journal of Enterprise Technologies”, 1/4 (55), 29-33.
11. Povkhan I. (2019) General scheme for constructing the most complex logical tree of classification in pattern recognition of discrete objects. Collection of scientific papers "electronics and information technology", Lviv, Issue 11, 112-117.
12. Laver V.O., Povkhan I.F. (2019) Algorithms for constructing logical classification trees in pattern recognition problems. Scientific notes of Tauride national University. Series: technical Sciences, Volume 30(69) No. 4, 2019, 100-106.
13. Povkhan I.F. (2020) Question of general complexity of the procedure for constructing a binary logical classification tree for an arbitrary case. Scientific journal: Telecommunications and information technologies, No.2 (67), 2020, 100-111.
14. Vtogoff P.E. (2009) Incremental Induction of Decision Trees. Machine Learning, № 4, 61−186.
15. Whitley D. (2001) An overview of evolutionary algorithms: practical issues and common pitfalls. Information and Software Technology, Vol.43, №14, P. 817-831.
16. Povhan I. (2016) Designing of recognition system of discrete objects, IEEE First International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine. Lviv, pp. 226-231. 17. Kotsiantis S.B. (2007) Supervised Machine Learning: A Review of Classification Techniques, Informatica, No. 31, pp. 249-268.
18. Subbotin S.A. (2019) Construction of decision trees for the case of low-information features, Radio Electronics, Computer Science, Control, No. 1, pp. 121-130. 19. Deng H., Runger G., Tuv E. (2011) Bias of importance measures for multi-valued attributes and solutions, Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN), pp. 293-300.
20. Subbotin S. A. (2014). Methods and characteristics of locality-preserving transformations in the problems of computational intelligence. Radio Electronics, Computer Science, Control. № 1. P. 120-128.
21. Subbotin S. A. (2013). Methods of sampling based on exhaustive and evolutionary search. Automatic Control and Computer Sciences. Vol. 47, № 3. P. 113-121.
22. De Mántaras R. L. (1991). A distance-based attribute selection measure for decision tree induction. Machine learning. Vol. 6, № 1. P. 81-92.
23. Miyakawa M. (1989). Criteria for selecting a variable in the construction of efficient decision trees. IEEE Transactions on Computers. Vol. 38, № 1. P. 130-141.
24. Povkhan I.F. (2019). Features of synthesis of generalized features in the construction of recognition systems using the logical tree method. Information technologies and computer modeling ІТКМ-2019 : materials of the international scientific and practical conference, Ivano- Frankivsk, May 20–25, 2019. Ivano-Frankivsk, 2019. P. 169-174.

Published

2021-04-14

Issue

Section

Articles