PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Reinforcement Learning Algorithms for Online Single-Machine Scheduling

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
Federated Conference on Computer Science and Information Systems (15 ; 06-09.09.2020 ; Sofia, Bulgaria)
Języki publikacji
EN
Abstrakty
EN
Online scheduling has been an attractive field of research for over three decades. Some recent developments suggest that Reinforcement Learning (RL) techniques have the potential to deal with online scheduling issues effectively. Driven by an industrial application, in this paper we apply four of the most important RL techniques, namely Q-learning, Sarsa, Watkins's Q(λ), and Sarsa(λ), to the online single-machine scheduling problem. Our main goal is to provide insights on how such techniques perform. The numerical results show that Watkins's Q(λ) performs best in minimizing the total tardiness of the scheduling process.
Rocznik
Tom
Strony
277--283
Opis fizyczny
Bibliogr. 26 poz., il., tab., wykr., wz.
Twórcy
autor
  • LINKS Foundation - Advanced Computing and Applications, 10138 Torino, Italy
  • Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy
  • Department of Information Engineering, University of Brescia, 25123 Brescia, Italy
  • Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy
  • LINKS Foundation - Advanced Computing and Applications, 10138 Torino, Italy
Bibliografia
  • 1. P. Brucker, Scheduling Algorithms, 5th ed. Springer Publishing Company, Incorporated, 2010.
  • 2. S. C. Graves, “A review of production scheduling,” Operations Research, vol. 29, no. 4, pp. 646–675, 1981.
  • 3. Y. Li, S. Carabelli, E. Fadda, D. Manerba, R. Tadei, and O. Terzo, “Machine learning and optimization for production rescheduling in industry 4.0,” The International Journal of Advanced Manufacturing Technology, pp. 1–19, 2020.
  • 4. T. Gabel and M. Riedmiller, “Adaptive reactive job-shop scheduling with reinforcement learning agents,” International Journal of Information Technology and Intelligent Computing, vol. 24, no. 4, pp. 14–18, 2008.
  • 5. H. Sharma and S. Jain, “Online learning algorithms for dynamic scheduling problems,” in 2011 Second International Conference on Emerging Applications of Information Technology, 2011, pp. 31–34.
  • 6. T. Zhang, S. Xie, and O. Rose, “Real-time job shop scheduling based on simulation and markov decision processes,” in 2017 Winter Simulation Conference (WSC). IEEE, 2017, pp. 3899–3907.
  • 7. P. Castrogiovanni, E. Fadda, G. Perboli, and A. Rizzo, “Smartphone data classification technique for detecting the usage of public or private transportation modes,” IEEE Access, vol. 8, pp. 58 377–58 391, 2020.
  • 8. E. Fadda, P. Plebani, and M. Vitali, “Optimizing monitorability of multi-cloud applications,” 06 2016, pp. 411–426.
  • 9. E. Fadda, G. Perboli, and G. Squillero, “Adaptive batteries exploiting on-line steady-state evolution strategy,” in Applications of Evolutionary Computation, G. Squillero and K. Sim, Eds. Cham: Springer International Publishing, 2017, pp. 329–341.
  • 10. E. Fadda, D. Manerba, R. Tadei, P. Camurati, and G. Cabodi, “KPIs for Optimal Location of charging stations for Electric Vehicles: the Biella case-study,” in Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 18. IEEE, 2019, pp. 123–126. [Online]. Available: http://dx.doi.org/10.15439/2019F171
  • 11. E. Fadda, D. Manerba, G. Cabodi, P. Camurati, and R. Tadei, “Comparative analysis of models and performance indicators for optimal service facility location,” Transportation Research part E: Logistics and Transportation Reviews (submitted), 2020.
  • 12. R. Giusti, C. Iorfida, Y. Li, D. Manerba, S. Musso, G. Perboli, R. Tadei, and S. Yuan, “Sustainable and de-stressed international supply-chains through the synchro-net approach,” Sustainability, vol. 11, p. 1083, 02 2019.
  • 13. K. Takadama and H. Fujita, “Toward guidelines for modeling learning agents in multiagent-based simulation: Implications from q-learning and sarsa agents,” in International Workshop on Multi-Agent Systems and Agent-Based Simulation. Springer, 2004, pp. 159–172.
  • 14. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint https://arxiv.org/abs/1312.5602, 2013.
  • 15. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
  • 16. C. J. C. H. Watkins, Learning from delayed rewards. Thesis Submitted for Ph.D., King’s College, Cambridge, 1989.
  • 17. A. Kaban, Z. Othman, and D. Rohmah, “Comparison of dispatching rules in job-shop scheduling problem using simulation: a case study,” International Journal of Simulation Modelling, vol. 11, no. 3, pp. 129–140, 2012.
  • 18. H. Suwa and H. Sandoh, Online scheduling in manufacturing: A cumulative delay approach. Springer Science & Business Media, 2012.
  • 19. R. L. Graham, “Bounds for certain multiprocessing anomalies,” Bell System Technical Journal, vol. 45, no. 9, pp. 1563–1581, 1966.
  • 20. S. S. Panwalkar and W. Iskander, “A survey of scheduling rules,” Operations Research, vol. 25, no. 1, pp. 45–61, 1977.
  • 21. J. R. Correa and M. R. Wagner, “Lp-based online scheduling: from single to parallel machines,” Mathematical Programming, vol. 119, no. 1, pp. 109–136, 2009.
  • 22. X. Lu, R. Sitters, and L. Stougie, “A class of on-line scheduling algorithms to minimize total completion time,” Operations Research Letters, vol. 31, no. 3, pp. 232–236, 2003.
  • 23. S. Xie, T. Zhang, and O. Rose, “Online single machine scheduling based on simulation and reinforcement learning,” in Simulation in Produktion und Logistik 2019. Simulation in Produktion und Logistik 2019, 2019.
  • 24. S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, “Convergence results for single-step on-policy reinforcement-learning algorithms,” Machine learning, vol. 38, no. 3, pp. 287–308, 2000.
  • 25. V. François-Lavet, R. Fonteneau, and D. Ernst, “How to discount deep reinforcement learning: Towards new dynamic strategies,” arXiv preprint https://arxiv.org/abs/1512.02011, 2015.
  • 26. V. Cerone, E. Fadda, and D. Regruto, “A robust optimization approach to kernel-based nonparametric error-in-variables identification in the presence of bounded noise,” in 2017 American Control Conference (ACC). IEEE, may 2017. [Online]. Available: https://doi.org/10.23919
Uwagi
1. This research was partially supported by the Plastic and Rubber 4.0 (P&R4.0) research project, POR FESR 2014-2020 - Action I.1b.2.2, funded by Piedmont Region (Italy), Contract No. 319-31. The authors acknowledge all the project partners for their contribution.
2. Track 1: Artificial Intelligence
3. Technical Session: 13th International Workshop on Computational Optimization
4. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-9ff88f2c-6539-4e75-88ea-253890f9fb74
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.