Comprehensive Overview of Optimization Techniques in Machine Learning Training
Abstract
This article offers a comprehensive overview of optimization techniques employed in training machine learning (ML) models. Machine learning, a subset of artificial intelligence, employs statistical methods to enable systems to learn and improve from experience without explicit programming. The paper delineates the significance of optimization in ML, emphasizing its role in adjusting model parameters to minimize loss functions, thereby ensuring efficient model training and improved generalization. The discussion encompasses various optimization methods, including Gradient Descent Variants, Adaptive Learning Rate Methods, Second-Order Optimization Methods, Regularization Methods, Constraint-based Methods, and Bayesian Optimization. Each section elucidates the principles, applications, and benefits of these techniques, highlighting their relevance in addressing challenges such as overfitting, scalability, and computational efficiency. The article aims to guide researchers, practitioners, and enthusiasts in navigating the complex landscape of optimization techniques tailored for diverse machine learning algorithms and applications.
Keywords
Full Text:
PDFReferences
M. M. Taye, “Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions,” Computers, vol. 12, no. 5, p. 91, 2023, https://doi.org/10.3390/computers12050091.
A. Haleem, M. Javaid, M. A. Qadri, R. P. Singh, R. Suman, “Artificial intelligence (AI) applications for marketing: A literature-based study,” International Journal of Intelligent Networks, vol. 3, pp. 119-132, 2022, https://doi.org/10.1016/j.ijin.2022.08.005.
R. Pugliese, S. Regondi, R. Marini, “Machine learning-based approach: global trends, research directions, and regulatory standpoints,” Data Science and Management, vol. 4, pp. 19-29, 2021, https://doi.org/10.1016/j.dsm.2021.12.002.
M. Gupta, K. Rajnish, V. Bhattacharjee, "Impact of Parameter Tuning for Optimizing Deep Neural Network Models for Predicting Software Faults,” Scientific Programming, vol. 2021, pp. 1-17, 2021, https://doi.org/10.1155/2021/6662932.
M. J. Bianco et al., “Machine learning in acoustics: Theory and applications,” The Journal of the Acoustical Society of America, vol. 146, no. 5, pp. 3590-3628, 2019, https://doi.org/10.1121/1.5133944.
T. Q. Bao and C. Gutiérrez, “Ekeland variational principles for vector equilibrium problems,” Optimization, vol. 73, no. 1, pp. 29-62, 2024, https://doi.org/10.1080/02331934.2022.2094264.
S. Ali, “Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence,” Information Fusion, vol. 99, p. 101805, 2023, https://doi.org/10.1016/j.inffus.2023.101805.
S. Raschka, J. Patterson, C. Nolet, “Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence,” Information, vol. 11, no. 4, p. 193, 2020, https://doi.org/10.3390/info11040193.
U. Sivarajah, M. M. Kamal, Z. Irani, V. Weerakkody, “Critical analysis of Big Data challenges and analytical methods,” Journal of Business Research, vol. 70, pp. 263-286, 2017, https://doi.org/10.1016/j.jbusres.2016.08.001.
Y. Tian, Y. Zhang, H. Zhang, “Recent Advances in Stochastic Gradient Descent in Deep Learning,” Mathematics, vol. 11, no. 3, p. 682, 2023, https://doi.org/10.3390/math11030682.
L. Bottou, O. Bousquet, “The tradeoffs of large scale learning,” Advances in Neural Information Processing Systems 20 (NIPS 2007), pp. 161-168, 2008, https://proceedings.neurips.cc/paper_files/paper/2007/hash/0d3180d672e08b4c5312dcdafdf6ef36-Abstract.html.
G. Ioannou, T. Tagaris, A. Stafylopatis, “AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization,” Neural Processing Letters, vol. 55, pp. 6311–6338, 2023, https://doi.org/10.1007/s11063-022-11140-w.
I. K. Dassios, “Analytic Loss Minimization: Theoretical Framework of a Second Order Optimization Method,” Symmetry, vol. 11, no. 2, p. 136, 2019, https://doi.org/10.3390/sym11020136.
J. Z. Zhang, N.Y. Deng, L. H. Chen, “New Quasi-Newton Equation and Related Methods for Unconstrained Optimization,” Journal of Optimization Theory and Applications, vol. 102, pp. 147-167, 1999, https://doi.org/10.1023/A:1021898630001.
T. Guo, Y. Liu, C. Han, “An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning,” Journal of the Operations Research Society of China, vol. 11, pp. 245-275, 2023, https://doi.org/10.1007/s40305-023-00453-9.
J. M. Bofill, “Updated Hessian matrix and the restricted step method for locating transition structures,” Journal of Computational Chemistry, vol. 15, no. 1, pp. 1-11, 1994, https://doi.org/10.1002/jcc.540150102.
O. Demir-Kavuk, M. Kamada, T. Akutsu, E. Knapp, “Prediction using step-wise L1, L2 regularization and feature selection for small data sets with large number of features,” BMC Bioinformatics, vol. 12, no. 412, 2011, https://doi.org/10.1186/1471-2105-12-412.
L. Liu, J. Gao, G. Beasley, S. Jung, “LASSO and Elastic Net Tend to Over-Select Features,” Mathematics, vol. 11, no. 17, p. 3738, 2023, https://doi.org/10.3390/math11173738.
X. Wang, Y. Jin, S. Schmitt, M. Olhofer, “Recent Advances in Bayesian Optimization,” ACM Computing Surveys, vol. 55, no. 13s, pp. 1-36, 2023, https://doi.org/10.1145/3582078.
DOI: https://doi.org/10.59247/csol.v2i1.69
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 Karthick K