Efficient storage, retrieval and analysis of poker hands: An adaptive data framework

Gorawski, M.; Lorek, M.

doi:10.1515/amcs-2017-0049

Artykuł - szczegóły

Tytuł artykułu

Efficient storage, retrieval and analysis of poker hands: An adaptive data framework

Autorzy

Gorawski M. , Lorek M.

Treść / Zawartość

Pełne teksty:

04_gorawski_lorek_efficient_storage_2017_4.pdf

Pobierz

Identyfikatory

DOI

10.1515/amcs-2017-0049

Warianty tytułu

Języki publikacji

Abstrakty

In online gambling, poker hands are one of the most popular and fundamental units of the game state and can be considered objects comprising all the events that pertain to the single hand played. In a situation where tens of millions of poker hands are produced daily and need to be stored and analysed quickly, the use of relational databases no longer provides high scalability and performance stability. The purpose of this paper is to present an efficient way of storing and retrieving poker hands in a big data environment. We propose a new, read-optimised storage model that offers significant data access improvements over traditional database systems as well as the existing Hadoop file formats such as ORC, RCFile or SequenceFile. Through index-oriented partition elimination, our file format allows reducing the number of file splits that needs to be accessed, and improves query response time up to three orders of magnitude in comparison with other approaches. In addition, our file format supports a range of new indexing structures to facilitate fast row retrieval at a split level. Both index types operate independently of the Hive execution context and allow other big data computational frameworks such as MapReduce or Spark to benefit from the optimized data access path to the hand information. Moreover, we present a detailed analysis of our storage model and its supporting index structures, and how they are organised in the overall data framework. We also describe in detail how predicate based expression trees are used to build effective file-level execution plans. Our experimental tests conducted on a production cluster, holding nearly 40 billion hands which span over 4000 partitions, show that multi-way partition pruning outperforms other existing file formats, resulting in faster query execution times and better cluster utilisation.

Słowa kluczowe

big data storage model design data architecture data access path optimization

zbiór danych architektura danych udostępnianie danych optymalizacja obszaru

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2017

Tom

Vol. 27, no. 4

Strony

713--726

Opis fizyczny

Bibliogr. 22 poz., rys., tab. wykr.

Twórcy

autor

Gorawski M.

marcin.gorawski@polsl.pl

Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland

autor

Lorek M.

michal.lorek@polsl.pl

Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland

Bibliografia

[1] Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A. and Rasin, A. (2009). HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads, Proceedings of the VLDB Endowment 2(1): 922–933, DOI: 10.14778/1687627.1687731.
[2] Alamoudi, A., Grover, R., Carey, M.J. and Borkar, V. (2015). External data access and indexing in AsterixDB, Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, pp. 3–12, DOI: 10.1145/2806416.2806428.
[3] Ambekar, G., Chikane, T., Sheth, S., Sable, A. and Ghag, K. (2015). Anticipation of winning probability in poker using data mining, International Conference on Computer, Communication and Control, Indore, India, pp. 1–6, DOI: 10.1109/IC4.2015.7375593.
[4] Delaney, K. (2009). Microsoft SQL Server 2008 Internals, Microsoft Press, Redmond, WA.
[5] Hadoop (2014). Apache Hadoop, http://hadoop.apache.org.
[6] HDFS (2016). HDFS architecture, https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html.
[7] Hive (2014). Apache Hive, http://hive.apache.org.
[8] Jiang, D., Ooi, B.C., Shi, L. and Wu, S. (2010). The performance of MapReduce: And in-depth study, Proceedings of the VLDB Endowment 3(1–2): 472–483, DOI: 10.14778/1920841.1920903.
[9] Mealing, R. and Shapiro, J. (2015). Opponent modelling by expectation-maximisation and sequence prediction in simplified poker, IEEE Transactions on Computational Intelligence and AI in Games PP(99): 472–483, DOI:10.1109/TCIAIG.2015.2491611.
[10] Miltersen, P.B. and Sørensen, T.B. (2007). A near-optimal, Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, Honolulu, HI, USA, pp. 1168–1175, DOI:10.1145/1329125.1329357.
[11] Mullins, C.S. (2000). DB2 Developer’s Guide, Fourth Edition, Sams, Indianapolis, IN.
[12] MySQL (2016). MySQL internals manual: Writing a custom storage engine, http://dev.mysql.com/doc/internals/en/custom-engine.html.
[13] ORC (2016). Apache ORC, http://orc.apache.org/docs.
[14] PostgreSQL (2016). PostgreSQL documentation: Database page layout, https://www.postgresql.org/docs/9.1/static/storage-page-layout.html.
[15] RCFile (2016). Apache Hive, http://hive.apache.org/javadocs/r2.2.0/api/org/apache/hadoop/hive/ql/io/RCFile.html.
[16] Richter, S., Quiané-Ruiz, J., Schuh, S. and Dittrich, J. (2014). Towards zero-overhead static and adaptive indexing in Hadoop, The VLDB Journal 23(3): 469–494, DOI: 10.1007/s00778-103-0332-z.
[17] Shvachko, K., Kuang, H., Radia, S. and Chansler, R. (2010). The Hadoop distributed file system, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10, DOI: 10.1109/MSST.2010.5496972.
[18] Teófilo, L.F. and Reis, L.P. (2011). Identifying player’s strategies in no limit Texas Hold’em poker through the analysis of individual moves, EPIA Conference on Artificial Intelligence, Lisbon, Portugal, pp. 70–83.
[19] Teófilo, L.F., Reis, L.P. and Cardoso, H.L. (2013). Estimating the probability of winning for Texas Hold’em poker agents, IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Washington, DC, USA, pp. 369–374, DOI: 10.1109/WI-IAT.2013.134.
[20] Teófilo, Reis, L.P. and Cardoso, H.L. (2014). A profitable online no-limit poker playing agent, Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Washington, DC, USA, Vol. 03, pp. 286–293, DOI: 10.1109/WI-IAT.2014.179.
[21] Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H. and Murthy, R. (2010). Hive—a petabyte scale data warehouse using Hadoop, Data Engineering (ICDE), 2010 IEEE 26th International Conference on, Long Beach, CA, USA, pp. 996–1005, DOI: 10.1109/ICDE.2010.5447738.
[22] YARN (2016). Apache Hadoop YARN, http://hadoop. apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-ea6bf9c8-ca70-4963-a6d5-3d10a60ab884