An approximate dynamic programming approach for semi-cooperative multi-agent resource management

Boukhtouta, A.; Berger, J.; George, A.; Powell, W. B.

Artykuł - szczegóły

Tytuł artykułu

An approximate dynamic programming approach for semi-cooperative multi-agent resource management

Autorzy

Boukhtouta A. , Berger J. , George A. , Powell W. B.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Complex problems involving multiple agents exhibit varying degrees of cooperation. The levels of cooperation might reflect both differences in information as well as differences in goals. In this research, we develop a general mathematical model for distributed, semicooperative planning and suggest a solution strategy which involves decomposing the system into subproblems, each of which is specified at a certain period in time and controlled by an agent. The agents communicate marginal values of resources to each other, possibly with distortion. We design experiments to demonstrate the benefits of communication between the agents and show that, with communication, the solution quality approaches that of the ideal situation where the entire problem is controlled by a single agent.

Słowa kluczowe

approximate dynamic programming agent semi-cooperative multi-agent resource management

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2012

Tom

Vol. 2, No. 3

Strony

201--214

Opis fizyczny

Bibliogr. 71 poz., rys.

Twórcy

autor

Boukhtouta A.

abdeslem.boukhtouta@drdc-rddc.gc.ca

Defence Research and Development Canada-Valcartier 2459, Pie-XI Blvd North, Quebec, QC, G3J 1X5, Canada

autor

Berger J.

jean.berger@drdc-rddc.gc.ca@drdc-rddc.gc.ca

Defence Research and Development Canada-Valcartier 2459, Pie-XI Blvd North, Quebec, QC, G3J 1X5, Canada

autor

George A.

Department of Operations Research and Financial Engineering Princeton University, Princeton, NJ 08544

autor

Powell W. B.

powell@princeton.edu

Department of Operations Research and Financial Engineering Princeton University, Princeton, NJ 08544

Bibliografia

[1] J. Marschak, Elements for a theory of teams, Management Sci., vol. 1, pp. 127137, 1955.
[2] R. Radner, Team decision problems, Ann. Math. Stat., vol. 33, pp. 857881, 1962.
[3] R. Beckers, O. E. Holland, and J. L. Deneubourg, From local actions to global tasks: Stigmergy in collective robotics, in Artificial Life IV, R. Brooks and P. Maes, Eds. Cambridge, MA: MIT Press, 1994, pp. 181189.
[4] R. Arkin, Cooperation without communication: Multi-agent schemabased robot navigation, Journal of Robotic Systems, vol. 9, no. 3, pp. 351364, 1992.
[5] L. Steels, A case study in the behavior-oriented design of autonomous agents, in Proc. Simulation of Adaptive Behavior, 1994.
[6] P. Stone and M. Veloso, Multiagent systems: A survey from a machine learning perspective, Autonomous Robots, vol. 8, pp. 345383, 2000.
[7] M.Wiering, B. Krose, and F. C. A. Groen, Learning in multi-agent systems, University of Amsterdam, Amsterdam, The Netherlands, Tech. Rep., 2000.
[8] P. Davidsson, L. Henesey, L. Ramstedt, J. Tornquist, and F. Wernstedt, An analysis of agentbased approaches to transport logistics, Transportation Research, vol. C13, pp. 255271, 2005.
[9] M. J. Mataric, Using communication to reduce locality in multirobot learning, in Proceedings of the National Conference on Artificial Intelligence, 1997, pp. 643648.
[10] R. Salustowicz, M. Wiering, and J. Schmidhuber, Learning team strategies: Soccer case studies, Machine Learning, vol. 33, no. 2-3, pp. 263282, 1998.
[11] M. A. Wiering, R. P. Salustowicz, and J. Schmidhuber, Cmac models learn to play soccer, in Proceedings of the 8th International Conference on Artificial Neural Networks (ICANN98), M. B. L. Niklasson and T. Ziemke, Eds., vol. 1. Springer-Verlag, London, 1998, pp. 443448.
[12] M. Dorigo and G. Di Caro, The ant colony optimization metaheuristic, in New Ideas in Optimization, D. Corne, M. Dorigo, and F. Glover, Eds. London: McGraw-Hill, 1999, pp. 1132.
[13] J. Schneider, W.-K. Wong, A. Moore, and M. Riedmiller, Distributed value functions, in Proc. 16th International Conf. on Machine Learning. Morgan Kaufmann, San Francisco, CA, 1999, pp. 371378.
[14] R. Crites and A. Barto, Elevator group control using multiple reinforcement learning agents, Machine Learning, vol. 33, pp. 235262, 1994.
[15] M. Littman and J. Boyan, A distributed reinforcement learning scheme for network routing, in Proceedings of the InternationalWorkshop on Applications of Neural Networks to Telecommunications, J. Alspector, R. Goodman, and T. X. Brown, Eds., 1993, pp. 4551.
[16] Y. Ye and J. K. Tsotsos, On the collaborative object search team: A formulation, in DAI Meets Machine Learning, Lecture Notes in Artificial Intelligence, G. Weiss, Ed. Springer-Verlag, 1997, pp. 94116.
[17] N. Carver, Z. Cvetanovic, and V. R. Lesser, Sophisticated cooperation in FA/C distributed problem solving systems, in Proceedings of the Ninth National Conference on Artificial Intelligence, 1991.
[18] G. Zlotkin and J. S. Rosenschein, Mechanisms for automated negotiation in state oriented domains, Journal of Artificial Intelligence Research, vol. 5, pp. 163238, 1996.
[19] Y. U. Cao, A. S. Fukunaga, and A. B. Kahng, Cooperative mobile robotics: Antecedents and directions, Autonomous Robots, vol. 4, pp. 727, 1997.
[20] S. Sen, M. Sekaran, and J. Hale, Learning to coordinate without sharing information, in Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle,WA, 1994, pp. 426431.
[21] J. Mayer, Stochastic linear programming algorithms: A comparison based on a model management system. Springer, 1998.
[22] M. J. Shaw and R. Sikora, A distributed problemsolving approach to rule induction: Learning in distributed artificial intelligence systems, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep., November 1990.
[23] S. S. Sian, Adaptation based on cooperative learning in multi-agent systems, in Decentralized A.I.2: Proceedings of the 2nd EuropeanWorkshop on Modelling Autonomous Agents in a Multi-Agent World, Y. Demazeau and J. Muller, Eds. Amsterdam: North-Holland, 1991, pp. 257272.
[24] S. D. Whitehead, A complexity analysis of cooperative mechanisms in reinforcement learning. In Proceedings of the Ninth National Conference on Artificial Intelligence, 1991, pp. 607613.
[25] M. Tan, Cost-sensitive reinforcement learning for adaptive classification and control. in AAAI, 1991, pp. 774780.
[26] G. Wei, Learning to coordinate actions in multiagent systems, in Proceedings of the International Joint Conference on Artificial Intelligence, August 1993, pp. 311316.
[27] R. H. Crites and A. G. Barto, Improving elevator performance using reinforcement learning, in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds., vol. 8. The MIT Press, 1996, pp. 10171023.
[28] R. G. Smith, The contract net protocol: Highlevel communication and control in a distributed problem solver, IEEE Transactions on Computers, vol. C-29, no. 12, pp. 11041113, 1980.
[29] M. Flint, E. Fernandez, and M. Polycarpou, Stochastic models of a cooperative autonomous uav search problem, Military Operations Research, vol. 8, no. 4, pp. 1332, 2003.
[30] M.-W. Jang, S. Reddy, P. Tosic, L. Chen, and G. Agha, An actorbased simulation for studying uav coordination, Proceedings of the 15th European Simulation Symposium, 2003.
[31] Y. Dongyong, T. Qiong, F. Luping, Z. Xiao, and J. Jingping, Cooperative multi-agent transport based on coevolution, August 2004.
[32] P. B. Sujit and D. Ghose, Search using multiple uavs with flight time constraints, IEEE Transactions on Aerospace and Electronic Systems, vol. 40, no. 2, pp. 407463, April 2004.
[33] E. H. Durfee, V. R. Lesser, and D. D. Corkill, Distributed artificial intelligence, in Cooperation Through Communication in a Distributed Problem Solving-Network, M. N. Huhns, Ed. Los Altos, CA: Morgan Kauffman, 1987, p. 2958.
[34] R. Pope, S. Conry, and R. Meyer, Distributing the planning process in a dynamic environment, in Proceedings of the Eleventh International Workshop on Distributed Artificial Intelligence, Glen Arbor, MI, 1992, p. 317331.
[35] D. D. Corkill, A framework for organizational self-design in distributed problem solving networks. Ph.D. dissertation, University of Massachusetts, Amherst, MA., 1982.
[36] L. Gasser and M. N. Huhns, Distributed artificial intelligence. San Mateo, CA, USA: Morgan Kaufmann Publishers Inc., 1989, vol. 2.
[37] M. Tan and R. Weihmayer, Integrating agentoriented programming and planning for cooperative problem-solving, in Proceedings of the AAAI-92sWorkshop on Cooperation among Heterogeneous Intelligent Agents, San Jose, MI, 1992.
[38] M. P. Wellman, A market-oriented programming environment and its application to distributed multicommodity flow problems, Journal of Intelligence Research, vol. 1, pp. 123, 1993.
[39] G. Dantzig and P. Wolfe, Decomposition principle for linear programs, Operations Research, vol. 8, pp. 101111, 1960.
[40] W. J. Baumol and T. Fabiani, Decomposition, pricing for decentralization and external economies, Management Science, vol. 11, no. 1, pp. 132, September 1964.
[41] A. A. Assad, Multicommodity network flows-A survey, Networks, vol. 8, pp. 3791, 1978.
[42] J. F. Kurose and R. Simha, A microeconomic approach to optimal resource allocation in distributed computer systems, IEEE Transactions on Computers, vol. 38, no. 5, pp. 705717, 1989.
[43] V. R. Lesser, Cooperative multiagent systems: A personal view of the state of the art, IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 1, pp. 133142, January/February 1999.
[44] H. Topaloglu and W. B. Powell, Dynamic programming approximations for stochastic, timestaged integer multicommodity flow problems, Informs Journal on Computing, vol. 18, no. 1, pp. 3142, 2006.
[45] J. A. Shapiro and W. B. Powell, A metastrategy for large-scale resource management based on informational decomposition, Informs Journal on Computing, vol. 18, no. 1, pp. 4360, 2006.
[46] W. B. Powell, J. A. Shapiro, and H. P. Simao, A representational paradigm for dynamic resource transformation problems, in Annals of Operations Research, R. F. C. Coullard and J. H. Owens, Eds. J.C. Baltzer AG, 2001, pp. 231279.
[47] J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Eds., Handbook of Learning and Approximate Dynamic Programming. New York: IEEE Press, 2004.
[48] T. Wu, W. Powell, and A. Whisman, The optimizing simulator: An intelligent analysis tool for the military airlift problem, Princeton University, Princeton, NJ, Tech. Rep., 2007.
[49] G. Godfrey and W. B. Powell, An adaptive, dynamic programming algorithm for stochastic resource allocation problems II: Multi-period travel times, Transportation Science, vol. 36, no. 1, pp. 4054, 2002.
[50] W. B. Powell and H. Topaloglu, Stochastic programming in transportation and logistics, in Handbooks in Operations Research and Management Science: Stochastic Programming, A. Ruszczynski and A. Shapiro, Eds., vol. 10. Amsterdam: Elsevier, 2003, pp. 555635.
[51] M. S. Fox, An organizational view of distributed systems, in Distributed Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1988.
[52] P. Brazdil, M. Gams, S. Sian, L. Torgo, and W. van de Velde, Learning in distributed systems and multi-agent environments, in Lecture Notes in Artificial Intelligence Vol.482: Machine Learning-EWSL-91, Y. Kodratoff, Ed., 1991, pp. 412423.
[53] C. Claus and C. Boutilier, The dynamics of reinforcement learning in cooperative multiagent systems, in Proceedings of the Fifteenth National Conference on Artificial Intelligence, 1998, pp. 746752.
[54] S. Bergman, E. Pavlov, and J. S. Rosenschein, Negotiation in stateoriented domains with incomplete information over goals, in European Conference on Artificial Intelligence, 2004, pp. 812.
[55] M. Flint and E. Fernandez, Approximate dynamic programming methods for cooperative uav search, BAE Systems, Advanced Information Technologies, Burlington, MA, Tech. Rep., 2005.
[56] R. Van Slyke and R. Wets, L-shaped linear programs with applications to optimal control and stochastic programming, SIAM Journal of Applied Mathematics, vol. 17, no. 4, pp. 638663, 1969.
[57] J. Birge, Decomposition and partitioning techniques for multistage stochastic linear programs, Operations Research, vol. 33, no. 5, pp. 9891007, 1985.
[58] J. Higle and S. Sen, Stochastic decomposition: An algorithm for two stage linear programs with recourse, Mathematics of Operations Research, vol. 16, no. 3, pp. 650669, 1991.
[59] Z.-L. Chen and W. B. Powell, A convergent cutting-plane and partialsampling algorithm for multistage stochastic linear programs with recourse, Journal of Optimization Theory and Applications, vol. 102, no. 3, pp. 497524, 1999.
[60] M. L. Puterman, Markov Decision Processes. New York: John Wiley Sons, 1994.
[61] D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
[62] R. Sutton and A. Barto, Reinforcement Learning. Cambridge, Massachusetts: The MIT Press, 1998.
[63] W. B. Powell, Approximate Dynamic Programming: Solving the curses of dimensionality. New York: John Wiley and Sons, 2007.
[64] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New York, NY: Springer series in Statistics, 2001.
[65] J. Birge and F. Louveaux, Introduction to Stochastic Programming. New York: Springer-Verlag, 1997.
[66] A. Marar, W. B. Powell, and S. Kulkarni, Capturing expert knowledge in resource allocation problems through low-dimensional patterns, IIE Transactions, vol. 38, no. 2, pp. 159172, 2006.
[67] E. H. Durfee and V. R. Lesser, Partial global planning: A coordination framework for distributed hypothesis formation, IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 5, pp. 11671183, 1991.
[68] P. Cohen and C. R. Perrault, Elements of a planbased theory of speech acts, Cognitive Science, vol. 3, no. 3, pp. 177212, 1979.
[69] G. Godfrey and W. B. Powell, An adaptive, dynamic programming algorithm for stochastic resource allocation problems I: Single period travel times, Transportation Science, vol. 36, no. 1, pp. 2139, 2002.
[70] W. B. Powell and H. Topaloglu, Fleet management, in Applications of Stochastic Programming, S. Wallace and W. Ziemba, Eds. Philadelphia: Math Programming Society - SIAM Series in Optimization, 2005, pp. 185216.
[71] H. Topaloglu and W. B. Powell, Dynamic programming approximations for stochastic, timestaged integer multicommodity flow problems, Informs Journal on Computing, vol. 18, no. 1, pp. 3142, 2006.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-e43f76a6-b793-443f-830a-43ef97dd5fa7