ITPM 2025: pp. 1 – 10
Authors:
1. Oleksandr Oriekhov
2. Tetyana Farionova
3. Liubava Chernova
4. MykhaΡlo Vorona
1 – 4. Admiral Makarov National University of Shipbuilding, Ukraine, Heroes avenue, 9, Mykolaiv, 54007, Ukraine.
AbstractΒ
The research proposes a five-factor nonlinear regression model for JAVA applications size estimation at the early stages of project planning for further usage in parametric models for effort estimation of software development. Accurate software development effort estimation is necessary for project planning to manage risk assessment, identify potential planning gaps, enhance the efficiency of the software development process, resource allocation, and costs. JAVA is one of the widely used programming languages in the world and is actively used in the development of various software projects. The aim of the research is to improve the accuracy and reliability of JAVA-applications size estimation at the early stages of software project planning. To achieve this goal, existing equations and models for JAVA-application KLOC estimations were reviewed and compared. A dataset of 571 open-source JAVA applications code metrics was collected using the CK static code analysis tool, and it was split up into learning and validation samples for model construction and validation. Firstly, the five-factor regression model is constructed using the total number of actual classes and interfaces metrics and averages of VMQ, TFQ, and CBO per class on the basis of a multivariate Box-Cox normalizing function. The model is constructed through an iterative process by detecting and removing anomalies from the sample. The constructed model is compared to the existing models by the standard regression model quality criteria such as the coefficient of determination, πππ πΈ, and the ππ πΈπ·(0.25). The estimates of criteria π !, πππ πΈ, and ππ πΈπ·(0.25) for the latest iteration on the learning sample are 0.9759, 0.1276, and 0.9008 respectively, and the estimates on the basis of initial learning and validation samples exceeded the thresholds, which indicates good accuracy and reliability of the constructed regression model. The forecast interval was constructed for the regression and compared with four-factor nonlinear regression. The study confirms that the accuracy and reliability of KLOC estimation for JAVA applications have been successfully improved.
Keywords
Software project management, application size estimation, nonlinear regression model, JAVA application, normalizing function, non-Gaussian data, multivariate Box-Cox normalizing function, multicollinearity, code metrics.Β
References
[1] S. McConnel, Software Estimation: Demystifying the Black Art, Microsoft Press, Redmond,
Washington, USA, 2006.
[2] S. W. Munialo, A Review of Agile Software Effort Estimation Methods, volume 5 of International
Journal of Computer Applications Technology and Research., Association of Technology and
Science, 2016, 612-618. doi:10.7753/IJCATR0509.1009.
[3] The Standish Group, Chaos report 2015, 2015. URL:
https://standishgroup.com/sample_research_files/CHAOSReport2015-Final.pdf
[4] TIOBE, TIOBE Index, 2024. URL: https://www.tiobe.com/tiobe-index/.
[5] H. B. K. Tan, Y. Zhao, H. Zhang, Estimating LOC for information systems from their conceptual
data models, in: Proceedings of 28th. International Conference on Software Engineering, 2006,
pp. 321-330. doi:10.1145/1134285.1134331.
[6] H. B. K. Tan, Y. Zhao, H., H. Zhang, Conceptual Data Model-Based Software Size Estimation for
Information Systems, volume 19 of ACM Transactions of Software Engineering and
Methodology, 2009. doi:10.1145/1571629.1571630.
[7] N. V. Prykhodko, S. B. Prykhodko, A non-linear regression model for estimation of the size of
JAVA enterprise information systems software, volume 85 of Modeling and Information
Technologies, 2018, pp. 81-88. URL: http://nbuv.gov.ua/UJRN/Mtit_2018_85_14.
[8] L. M. Makarova, N.V. Prykhodko, O. O. Kudin, Constructing the non-linear regression model
for size estimation of WEB-applications implemented in JAVA, volume 69 of Herald (Kherson
National Technical University), 2019, pp. 145-153.
[9] S. B. Prykhodko, N. V. Prykhodko, T. G. Smykodub, Four-factor nonlinear regression model to
estimate the size of open source JAVA-based applications, volume 70 of Scientific Notes of
Taurida National V.I. Vernadsky University. Series: Technical Sciences, 2020, pp. 157-162.
doi:10.32838/2663-5941/2020.2-1/25.
[10] O. Oriekhov, T. Farionova, Mathematical models for the size estimating of JAVA applications,
volume 89 of Herald (KNTU), 2024, pp. 196-203, doi:10.35546/kntu2078-4481.2024.2.28.
[11] O. Oriekhov, T. Farionova, L. Chernova, Three-factor nonlinear regression model of estimating
the size of JAVA-software, in Proceeding of 12th. Information Control Systems & Technologies,
Odesa, Ukraine, 2024. URL:https://ceur-ws.org/Vol-3790/paper44.pdf.
[12] O. Oriekhov, The four-factor nonlinear regression model for early JAVA-applications size
estimation, in: N. Aksak, D. Antonov, ICST-2024: Advances in Information Control Systems and
Technologies, Liha Press, Lviv, Ukraine, 2024, pp. 360-379. doi:10.36059/978-966-397-422-4.
[13] D. Port, M. Korte, Comparative studies of the model evaluation criterions MMRE and PRED in
software cost estimation research, Proceedings of the 2nd. ACM-IEEE International Symposium
on Empirical Software Engineering and Measurement, ACM, New York, 2008, pp. 51β60.
doi:10.1145/1414004.1414015.
[14] R. Subramanyam, M. Krishnan, Empirical Analysis of CK Metrics for Object-Oriented Design
Complexity: Implications for Software Defects, volume 29 of IEEE Transactions on Software
Engineering, pp. 297- 310. doi:10.1109/TSE.2003.1191795.
[15] S. Prykhodko, N. Prykhodko, Mathematical Modeling of Non-Gaussian Dependent Random
Variables by nonlinear Regression Models Based on the Multivariate Normalizing
Transformations, in S. Shkarlet, A. Morozov, A. Palagin, volume 1265 of Mathematical Modeling
and Simulation of Systems (MODS’2020). Advances in Intelligent Systems and Computing, 2021,
pp. 166-174. doi:10.1007/978-3-030-58124-4_16
[16] K. V. Mardia, Measures of multivariate skewness and kurtosis with applications, volume 57 of
Biometrika, 1970, pp. 519β530. doi:10.1093/biomet/57.3.519.
[17] I. Olkin, A. R. Sampson, Multivariate Analysis: Overview, in N. J. Smelser, P. B. Baltes,
International encyclopedia of social & behavioral sciences, 1st. ed., Elsevier, Pergamon, 2001,
pp. 10240β10247.
Full text