Reinforcement Learning Algorithms for Online Single-Machine Scheduling

Li, Yuanyuan; Fadda, Edoardo; Manerba, Daniele; Tadei, Roberto; Terzo, Olivier

Artykuł - szczegóły

Tytuł artykułu

Reinforcement Learning Algorithms for Online Single-Machine Scheduling

Autorzy

Li Yuanyuan , Fadda Edoardo , Manerba Daniele , Tadei Roberto , Terzo Olivier

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

Warianty tytułu

Konferencja

Federated Conference on Computer Science and Information Systems (15 ; 06-09.09.2020 ; Sofia, Bulgaria)

Języki publikacji

Abstrakty

Online scheduling has been an attractive field of research for over three decades. Some recent developments suggest that Reinforcement Learning (RL) techniques have the potential to deal with online scheduling issues effectively. Driven by an industrial application, in this paper we apply four of the most important RL techniques, namely Q-learning, Sarsa, Watkins's Q(λ), and Sarsa(λ), to the online single-machine scheduling problem. Our main goal is to provide insights on how such techniques perform. The numerical results show that Watkins's Q(λ) performs best in minimizing the total tardiness of the scheduling process.

Słowa kluczowe

learning artificial intelligence single machine scheduling

uczenie się sztuczna inteligencja problem szeregowania

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2020

Tom

Vol. 21

Strony

277--283

Opis fizyczny

Bibliogr. 26 poz., il., tab., wykr., wz.

Twórcy

autor

Li Yuanyuan

yuanyuan.li@linksfoundation.com

LINKS Foundation - Advanced Computing and Applications, 10138 Torino, Italy

autor

Fadda Edoardo

edoardo.fadda@polito.it

Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy

autor

Manerba Daniele

daniele.manerba@unibs.it

Department of Information Engineering, University of Brescia, 25123 Brescia, Italy

autor

Tadei Roberto

roberto.tadei@polito.it

Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy

autor

Terzo Olivier

olivier.terzo@linksfoundation.com

LINKS Foundation - Advanced Computing and Applications, 10138 Torino, Italy

Bibliografia

1. P. Brucker, Scheduling Algorithms, 5th ed. Springer Publishing Company, Incorporated, 2010.
2. S. C. Graves, “A review of production scheduling,” Operations Research, vol. 29, no. 4, pp. 646–675, 1981.
3. Y. Li, S. Carabelli, E. Fadda, D. Manerba, R. Tadei, and O. Terzo, “Machine learning and optimization for production rescheduling in industry 4.0,” The International Journal of Advanced Manufacturing Technology, pp. 1–19, 2020.
4. T. Gabel and M. Riedmiller, “Adaptive reactive job-shop scheduling with reinforcement learning agents,” International Journal of Information Technology and Intelligent Computing, vol. 24, no. 4, pp. 14–18, 2008.
5. H. Sharma and S. Jain, “Online learning algorithms for dynamic scheduling problems,” in 2011 Second International Conference on Emerging Applications of Information Technology, 2011, pp. 31–34.
6. T. Zhang, S. Xie, and O. Rose, “Real-time job shop scheduling based on simulation and markov decision processes,” in 2017 Winter Simulation Conference (WSC). IEEE, 2017, pp. 3899–3907.
7. P. Castrogiovanni, E. Fadda, G. Perboli, and A. Rizzo, “Smartphone data classification technique for detecting the usage of public or private transportation modes,” IEEE Access, vol. 8, pp. 58 377–58 391, 2020.
8. E. Fadda, P. Plebani, and M. Vitali, “Optimizing monitorability of multi-cloud applications,” 06 2016, pp. 411–426.
9. E. Fadda, G. Perboli, and G. Squillero, “Adaptive batteries exploiting on-line steady-state evolution strategy,” in Applications of Evolutionary Computation, G. Squillero and K. Sim, Eds. Cham: Springer International Publishing, 2017, pp. 329–341.
10. E. Fadda, D. Manerba, R. Tadei, P. Camurati, and G. Cabodi, “KPIs for Optimal Location of charging stations for Electric Vehicles: the Biella case-study,” in Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. Maciaszek, and M. Paprzycki, Eds., vol. 18. IEEE, 2019, pp. 123–126. [Online]. Available: http://dx.doi.org/10.15439/2019F171
11. E. Fadda, D. Manerba, G. Cabodi, P. Camurati, and R. Tadei, “Comparative analysis of models and performance indicators for optimal service facility location,” Transportation Research part E: Logistics and Transportation Reviews (submitted), 2020.
12. R. Giusti, C. Iorfida, Y. Li, D. Manerba, S. Musso, G. Perboli, R. Tadei, and S. Yuan, “Sustainable and de-stressed international supply-chains through the synchro-net approach,” Sustainability, vol. 11, p. 1083, 02 2019.
13. K. Takadama and H. Fujita, “Toward guidelines for modeling learning agents in multiagent-based simulation: Implications from q-learning and sarsa agents,” in International Workshop on Multi-Agent Systems and Agent-Based Simulation. Springer, 2004, pp. 159–172.
14. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint https://arxiv.org/abs/1312.5602, 2013.
15. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
16. C. J. C. H. Watkins, Learning from delayed rewards. Thesis Submitted for Ph.D., King’s College, Cambridge, 1989.
17. A. Kaban, Z. Othman, and D. Rohmah, “Comparison of dispatching rules in job-shop scheduling problem using simulation: a case study,” International Journal of Simulation Modelling, vol. 11, no. 3, pp. 129–140, 2012.
18. H. Suwa and H. Sandoh, Online scheduling in manufacturing: A cumulative delay approach. Springer Science & Business Media, 2012.
19. R. L. Graham, “Bounds for certain multiprocessing anomalies,” Bell System Technical Journal, vol. 45, no. 9, pp. 1563–1581, 1966.
20. S. S. Panwalkar and W. Iskander, “A survey of scheduling rules,” Operations Research, vol. 25, no. 1, pp. 45–61, 1977.
21. J. R. Correa and M. R. Wagner, “Lp-based online scheduling: from single to parallel machines,” Mathematical Programming, vol. 119, no. 1, pp. 109–136, 2009.
22. X. Lu, R. Sitters, and L. Stougie, “A class of on-line scheduling algorithms to minimize total completion time,” Operations Research Letters, vol. 31, no. 3, pp. 232–236, 2003.
23. S. Xie, T. Zhang, and O. Rose, “Online single machine scheduling based on simulation and reinforcement learning,” in Simulation in Produktion und Logistik 2019. Simulation in Produktion und Logistik 2019, 2019.
24. S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, “Convergence results for single-step on-policy reinforcement-learning algorithms,” Machine learning, vol. 38, no. 3, pp. 287–308, 2000.
25. V. François-Lavet, R. Fonteneau, and D. Ernst, “How to discount deep reinforcement learning: Towards new dynamic strategies,” arXiv preprint https://arxiv.org/abs/1512.02011, 2015.
26. V. Cerone, E. Fadda, and D. Regruto, “A robust optimization approach to kernel-based nonparametric error-in-variables identification in the presence of bounded noise,” in 2017 American Control Conference (ACC). IEEE, may 2017. [Online]. Available: https://doi.org/10.23919

Uwagi

1. This research was partially supported by the Plastic and Rubber 4.0 (P&R4.0) research project, POR FESR 2014-2020 - Action I.1b.2.2, funded by Piedmont Region (Italy), Contract No. 319-31. The authors acknowledge all the project partners for their contribution.

2. Track 1: Artificial Intelligence

3. Technical Session: 13th International Workshop on Computational Optimization

4. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-9ff88f2c-6539-4e75-88ea-253890f9fb74