Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2023 | Vol. 35 | 689--697
Tytuł artykułu

An Enhancement of Reinforcement Learning by Scheduling with Learning Effects

Wybrane pełne teksty z tego czasopisma
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This paper present results, which reveal that approaches obtained for scheduling problems with learning effects can be successfully used to improve the quality of machine learning methods. It is illustrated by modelling some aspects of Q-learning agents as scheduling problems with the learning effect, and constructing sequencing and dispatching algorithms, which take into account the existence of learning. Their application to determine the sequence of tasks processed by Q-learning agents can visibly speed up their convergence to an optimal strategy. Furthermore, we show that a dispatch of tasks according to the longest processing time algorithm for parallel computing can be replaced by a more efficient procedure, if agents can learn. The numerical analysis reveals that our approach is efficient, robust and only marginally dependents on a learning model and an accurate approximation of task processing times.
Wydawca

Rocznik
Tom
Strony
689--697
Opis fizyczny
Bibliogr. 10 poz., wz., wykr.
Twórcy
  • General Tadeusz Kościuszko Military University of Land Forces Czajkowskiego 109, 51-147 Wrocław, Poland, rudek.radoslaw@gmail.com
Bibliografia
  • [1] D. Biskup, “A state-of-the-art review on scheduling with learning effects,” European Journal of Operational Research, vol. 188, pp. 315-329, 2008.
  • [2] R. Rudek, “Scheduling on parallel processors with varying processing times,” Computers & Operations Research, vol. 81, pp. 90–101, 2017.
  • [3] J. Xu, C.-C. Wu, Y. Yin, C. Zhao, Y.-T. Chiou, and W.-C. Lin, “An order scheduling problem with position-based learning effect,” Computers & Operations Research, vol. 74, pp. 175–186, 2016.
  • [4] C. Zhao, J. Fang, T. Cheng, and M. Ji, “A note on the time complexity of machine scheduling with DeJong’s learning effect,” Computers & Industrial Engineering, vol. 112, pp. 447–449, 2017.
  • [5] J. Pei, X. Liu, P. M. Pardalos, A. Migdalas, and S. Yang, “Serial-batching scheduling with time-dependent setup time and effects of deterioration and learning on a single-machine,” Journal of Global Optimization, vol. 67, pp. 251–262, 2017.
  • [6] R. Rudek, “A fast neighborhood search scheme for identical parallel machine scheduling problems under general learning curves,” Applied Soft Computing, vol. 113, pp. 108 023.1–16, 2021.
  • [7] C.-H. Wu, W.-C. Lee, P.-J. Lai, and J.-Y. Wang, “Some single-machine scheduling problems with elapsed-time-based and position-based learning and forgetting effects,” Discrete Optimization, vol. 19, pp. 1–11, 2016.
  • [8] M. I. Jordan and T. M. Mitchell, “Machine Learning: Trends, Perspectives, and Prospects,” Science, vol. 349, pp. 255–260, 2015.
  • [9] I. Grondman, L. Bu¸soniu, G. Lopes, and R. Babuška, “A survey of actor-critic reinforcement learning: standard and natural policy gradients,” IEEE Transactions On Systems, Man, And Cybernetics - Part C: Applications And Reviews, vol. 42, pp. 1291–1307, 2012.
  • [10] R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction. Cambridge: MIT Press, 1998.
  • [11] S. Whiteson and P. Stone, “Adaptive job routing and scheduling,” Engineering Applications of Artificial Intelligence, vol. 17, pp. 855-869, 2004.
  • [12] M. Pinedo, Scheduling: Theory, Algorithms and Systems (5rd ed.). New York: Springer, 2016.
  • [13] Y. Wang, X. Li, and R. Ruiz, “An exact algorithm for the shortest path problem with position-based learning effects,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 47, pp. 3037–3049, 2017.
  • [14] C. Watkins, “Q-Learning,” Machine Learning, vol. 8, pp. 279–292, 1992.
  • [15] W. Y. Kwon, I. H. Suh, and S. Lee, “SSPQL: stochastic shortest path-based Q-learning,” International Journal of Control, Automation, and Systems, vol. 9, pp. 328–338, 2011.
  • [16] A. Konar, I. G. Chakraborty, S. J. Singh, L. C. Jain, and A. K. Nagar, “A deterministic improved Q-learning for path planning of a mobile robot,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 43, pp. 1141–1153, 2013.
  • [17] Z. S. Givi, M. Y. Jaber, and W. P. Neumann, “Modelling worker reliability with learning and fatigue,” Applied Mathematical Modelling, vol. 39, pp. 5186–5199, 2015.
  • [18] C. H. Glock and Y. M. Jaber, “Learning effects and the phenomenon of moving bottlenecks in a two-stage production system,” Applied Mathematical Modelling, vol. 37, pp. 8617–8628, 2013.
  • [19] R. L. Graham, E. L. Lawler, J. K. Lenstra, and A. H. G. Rinnooy Kan, “Optimization and approximation in deterministic sequencing and scheduling: a survey,” Annals of Discrete Mathematics, vol. 5, pp. 287–326, 1979.
  • [20] R. Graham, “Bounds for certain multiprocessing anomalies,” Bell System Technical Journal, vol. 45, pp. 1563–1581, 1966.
  • [21] W. Li and J. Yuan, “LPT online strategy for parallel-machine scheduling with kind release times,” Optimization Letters, vol. 10, pp. 159–168, 2016.
Uwagi
1. This work was supported by the Polish National Science Centre under grant no. DEC-2020/37/B/HS4/03235.
2. Thematic Tracks Regular Papers
3. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-71db3716-736b-4eff-9256-3fbd2d61e021
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.