ITPM 2024: pp. 54 - 66
Authors:
- Oleksandr Oriekhov
- Tetyana Farionova
- Liubava Chernova
- Lyudmila Chernova
- Mykhaіlo Vorona
1 Admiral Makarov National University of Shipbuilding, Ukraine, Heroes avenue, 9, Mykolaiv, 54007, Ukraine.
2 Admiral Makarov National University of Shipbuilding, Ukraine, Heroes avenue, 9, Mykolaiv, 54007, Ukraine.
3 Admiral Makarov National University of Shipbuilding, Ukraine, Heroes avenue, 9, Mykolaiv, 54007, Ukraine.
4 Admiral Makarov National University of Shipbuilding, Ukraine, Heroes avenue, 9, Mykolaiv, 54007, Ukraine.
5 Admiral Makarov National University of Shipbuilding, Ukraine, Heroes avenue, 9, Mykolaiv, 54007, Ukraine.
Abstract
This paper introduces the usage of regression models and equations for Data Science and
Machine Learning Java applications size estimation. Size estimation of applications plays one of
the key planning tasks at the early stages of project planning for the successful implementation
of software development projects. Application size estimation is used to predict software
development effort estimation using parametric models such as COCOMO, COCOMO II, etc. The
aim of the study is to increase the reliability and accuracy of size estimation of Data Science and
Machine Learning Java applications at the early stage of software project planning using class
diagram metrics by building a nonlinear regression model. The object of research is the process
of size estimation for open-source Data Science and Machine Learning Java applications. The
subject of the study is the regression equations and nonlinear regression models to estimate the
software size. To achieve this goal, we analyzed and compared the existing mathematical
regression models and equations for Java applications size estimating on the sample of code
metrics information from open-source Java applications of Data Science and Machine Learning.
Proven the necessity of building the the three-factor nonlinear regression model for estimating
the software size of Data Science and Machine Learning Java applications on the basis of the
decimal logarithm normalizing transformation using the software code metrics such as the total
quantity of classes, the total visible methods quantity, and the average fields quantity per class.
The obtained nonlinear regression model is compared with the existing models by the
regression models quality criteria such as the determination coefficient, mean magnitude of
relative error and the percentage of prediction of the relative error level 0.25. The comparison
confirms increasing the accuracy of software size estimation using the given sample by the
obtained nonlinear regression model.
Keywords
Software size estimation, nonlinear regression model, normalizing transformation, Java, Data
Science, Machine Learning, non-Gaussian data, decimal logarithm normalizing
References
[1] G. Press, Top Machine Learning Statistics to know [2024], What’stheBigData.com,
2023. URL: https://whatsthebigdata.com/top-machine-learning-statistics/.
[2] N. Chinthamu, M. Karukuri, Data Science and Applications, Journal of Data Science and
Intelligent Systems, vol. 00, 2023, doi:10.47852/bonviewJDSIS3202837.
[3] TIOBE, TIOBE Index, 2024. URL: https://www.tiobe.com/tiobe-index/
[4] S. W. Munialo, A Review of Agile Software Effort Estimation Methods, volume 5 of
International Journal of Computer Applications Technology and Research. Association
of Technology and Science, 2016 , pp. 612-618. doi:10.7753/IJCATR0509.1009
[5] The Standish Group, Chaos report 2015, 2015. URL:
https://standishgroup.com/sample_research_files/CHAOSReport2015-Final.pdf
[6] H. B. K. Tan, Y. Zhao, H. Zhang, Estimating LOC for information systems from their
conceptual data models, Proceedings – International Conference on Software
Engineering, 2006, pp. 321-330. doi:10.1145/1134285.1134331.
[7] H. B. K. Tan, Y. Zhao, H., H. Zhang, Conceptual Data Model-Based Software Size
Estimation for Information Systems, volume 19 of ACM Transactions of Software
Engineering and Methodology, 2009, doi:10.1145/1571629.1571630.
[8] N. V. Prykhodko, S.B. Prykhodko, A nonlinear regression model for estimation of the
size of Java enterprise information systems software, volume 85 of Modeling and
Information Technologies, 2018, pp. 81-88. URL:
http://nbuv.gov.ua/UJRN/Mtit_2018_85_14
[9] L. M. Makarova, N.V. Prykhodko, O. O. Kudin, Constructing the non-linear regression
model for size estimation of web-applications implemented in Java, volume 69 of
Herald (Kherson National Technical University), 2019, pp. 145-153. URL:
http://eir.nuos.edu.ua/handle/123456789/4443
[10] S. B. Prykhodko, N. V. Prykhodko, T. G. Smykodub, Four-factor non-linear regression
model to estimate the size of open source Java-based applications, volume 70 of
Scientific Notes of Taurida National V.I. Vernadsky University. Series: Technical
Sciences, 2020, pp. 157-162. doi:https://doi.org/10.32838/2663-5941/2020.2-1/25
[11] O. S. Oriekhov, T. A. Farionova, Three-factor nonlinear regression model for
estimating the size of Data Science and Machine Learning projects created using the
JAVA programming language, volume 4 of ITMAS – 2023: Information Technologies:
Models, Algorithms, Systems, Mykolaiv: NUOS, Ukraine, 2023, pp. 45-47. URL:
https://itconf.nuos.edu.ua/2023/proceedings/.
[12] D. Port, M. Korte, Comparative studies of the model evaluation criterions MMRE and
PRED in software cost estimation research, Proceedings of the 2nd ACM-IEEE
International Symposium on Empirical Software Engineering and Measurement. ACM,
New York, 2008, pp. 51–60. doi:10.1145/1414004.1414015
[13] S. Prykhodko, N. Prykhodko, Mathematical Modeling of Non-Gaussian Dependent
Random Variables by Nonlinear Regression Models Based on the Multivariate
Normalizing Transformations, in S. Shkarlet, A. Morozov, A. Palagin, volume 1265 of
Mathematical Modeling and Simulation of Systems (MODS’2020). Advances in
Intelligent Systems and Computing, volume 1265 of MODS, 2021, pp. 166-174.
doi:10.1007/978-3-030-58124-4_16
[14] S. Prykhodko, N. Prykhodko, L. Makarova and A. Pukhalevych, Outlier Detection in
Non-Linear Regression Analysis Based on the Normalizing Transformations, in 2020
IEEE 15th International Conference on Advanced Trends in Radioelectronics,
Telecommunications and Computer Engineering (TCSET), Lviv-Slavske, Ukraine,
2020, pp. 407-410. doi:10.1109/TCSET49122.2020.235464.
[15] I. Olkin, A. R. Sampson, Multivariate Analysis: Overview, in N. J. Smelser, P. B. Baltes,
International encyclopedia of social & behavioral sciences (eds.) 1st edn., Elsevier,
Pergamon, 2001, pp. 10240–10247.
[16] K. V. Mardia, Measures of multivariate skewness and kurtosis with applications,
volume 57 of Biometrika, 1970, pp. 519–530. doi:10.1093/biomet/57.3.519.
Full text
Powered By EmbedPress