Accuracy-Time Efficient Hyperparameter Optimization Using Actor-Critic-based Reinforcement Learning and Early Stopping in OpenAI Gym Environment

Jan 13, 2026

—

Authors: Albert Budi Christian; Chih-Yu Lin; Yu-Chee Tseng; Lan-Da Van; Wan-Hsun Hu; Chia-Hsuan Yu

Publication Date: December 13, 2022 (Conference held: 24–26 November 2022)

Conference: 2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS) Bali, Indonesia

Abstract: This paper presents an accuracy-time efficient hyperparameter optimization (HPO) framework using advantage actor-critic (A2C)-based reinforcement learning combined with an early stopping mechanism in an OpenAI Gym environment. The proposed approach improves hyperparameter selection for machine learning models including XGBoost, support vector classifier, and random forest while reducing computational cost. Experiments conducted on ten standard datasets demonstrate that the framework improves average accuracy by 0.77% over default random forest hyperparameters, while early stopping reduces computation cost by an average of 64%, achieving an effective balance between model performance and training efficiency.

Accuracy-Time Efficient Hyperparameter Optimization Using Actor-Critic-based Reinforcement Learning and Early Stopping in OpenAI Gym Environment

Comments

Leave a Reply Cancel reply