Performance analysis of write operations in IDENTITY and UUID ordered tables

Penar, Maciej

doi:10.7862/re.2020.6

Artykuł - szczegóły

Tytuł artykułu

Performance analysis of write operations in IDENTITY and UUID ordered tables

Autorzy

Penar Maciej

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7862/re.2020.6

Warianty tytułu

Analiza wydajności operacji zapisu dla tabel uporządkowanych atrybutami IDENTITY oraz UUID

Języki publikacji

Abstrakty

Design of the database includes the decision about the physical storage. This is often overlooked as 1) this cannot be expressed in standard SQL and in result each Database Systems have their own way to specify the physical storage and 2) the decision is often made implicitly. This is dangerous situation as many of the databases use B+ trees as table implementation which stores the data physically sorted by some ordering attribute. The choice of the ordering attribute largely affects read and write operations. Commonly, IDENTITY/AUTO_INCREMENT constraint are being chosen as ordering attributes, due to their easy usage and monotonic nature. In some cases ordering tables by the attributes whose values are drawn from uniform distribution leads to better performance in terms of Transactions-Per-Second. Such cases includes situation when data does fit entirely in-memory or when we can limit the set of physical pages being accessed. In the end, however, We cannot entirely say that either monotonic or random attributes are superior. Both have their pros and cons. In this article We present (1) short description of the data structures in contemporary Database Systems, (2) the advantages and the disadvantages of the two common types which are used as the clustering attributes: GUID and IDENTITY, (3) performance analysis of write operation which compare both data types using B+ tree as primary storage and (4) evaluate the efficiency of these bulk load operation using heap files and B+ trees.

Projektowanie bazy danych wymaga podjęcia decyzji o fizycznej strukturze przechowującej dane. Często wpływ tej decyzji jest niedoceniany ponieważ 1) standard SQL nie precyzuje tego ograniczenia, przez co każdy dostawca Bazy Danych implementuje je po swojemu 2) wybór struktury jest podejmowany niejawnie. Na ogół domyślnymi strukturami są B+ drzewa które są strukturami posortowanymi. Wybór tej konkretnej implementacji tabeli wpływa zarówno na wydajność operacji odczytu jak i zapisu. Ze względu że częstą praktyką jest stosowanie atrybutów IDENTITY/AUTO_INCREMENT jako kluczy głównych, według tych wartości atrybutów ustalany jest fizyczny porządek tabeli. W pewnych przypadkach warto jednak korzystać z atrybutów o wartościach losowych w celu zwiększania przepustowości Bazy Danych (liczonej jako liczba transakcji na sekundę). Takie przypadki obejmują sytuację gdy dane mieszczą się w pamięci operacyjnej lub gdy możemy ograniczyć zbiór fizycznych stron do których Baza Danych będzie się odwoływać. W ogólnym przypadku ani atrybuty monotoniczne, ani losowe nie są lepsze od swoich konkurentów. W tym artykule (1) opisujemy struktury wykorzystywane we współczesnych Bazach Danych, (2) opisujemy zalety i wady dwóch najczęściej wykorzystywanych typów: GUID oraz IDENTITY, (3) prezentujemy analizę wydajności operacji zapisu porównującą oba typy w tabelach implementowanych jako B+ drzewo, (4) analizujemy wydajność operacji wsadowego ładowania zarówno w plikach sekwencyjnych jak i B+ drzew.

Słowa kluczowe

database design logical model heap files B + tree insert performance

projektowanie baz danych model logiczny porządkowanie pliki sekwencyjne B + drzewo UUID GUID IDENTITY sekwencje wydajność wstawiania ładowanie wsadowe

Wydawca

Oficyna Wydawnicza Politechniki Rzeszowskiej

Czasopismo

Zeszyty Naukowe Politechniki Rzeszowskiej. Elektrotechnika

Rocznik

2020

Tom

z. 38 [301], nr 1-2

Strony

81--95

Opis fizyczny

Bibliogr. 13 poz., rys., tab., wykr.

Twórcy

autor

Penar Maciej

mpenar@kia.prz.edu.pl

Rzeszów University of Technology, The Faculty of Electrical and Computer Engineering, Aleja Powstańców Warszawy 12, 35-959 Rzeszów

https://orcid.org/0000-0002-4481-807X

Bibliografia

[1] Ullman D.J., Widom J.: A First Course In Database Systems, Helion Publisher, pages 110-129, 1997
[2] Leach P., Mealling M., Salz R.: RFC 4122: A Universally Unique Identifier (UUID) URN Namespace, https://tools.ietf.org/html/rfc4122 (Access: 9 September 2018)
[3] Nilsson J.: The Cost of GUIDs as Primary Keys, http://www.informit.com/articles/article.aspx?p=25862 (Access: 9 September 2018)
[4] Clayton R.: Do you really need a UUID/GUID?, https://rclayton.silvrback.com/do-you-really-need-a-uuid-guid (Access: 9 September 2018)
[5] Ricken U.: GUID vs INT/IDENTITY als Clustered Key, https://www.db-berater.de/2015/04/guid-vs-intidentity-als-clustered-key-2/ (Access: 9 September 2018)
[6] Penn J.: Taking It Further: GUIDs vs INTs as Primary Keys, https://scifisql.com/2017/05/07/guids-vs-ints-as-primary-keys/, (Access: 9 September 2018)
[7] Boicea A., Bucur I., Radulescu F., Truica C.A.: Performance Evaluation for CRUD Operations in Asynchronously Replicated Document Oriented Database, 20th International Conference on Control Systems and Computer Science, Bucharest, 2015
[8] Li Y., Manoharan S.: A performance comparison of SQL and NoSQL databases, IEEE Pacific RIM Conference on Communications, Computers, and Signal Processing - Proceedings, 2013
[9] Elmasri R., Navathe S.: Fundamentals of Database Systems, Helion Publisher, pages 449 & 288-501, 2005
[10] Bača M., Grd P.: Analysis of B-tree data structure and its usage in computer forensics, Central European Conference on Information and Intelligent Systems, 2010
[11] Jhingran A., Khedkar P.: Analysis of Recovery in a Database System Using a Write-ahead Log Protocol, Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, 1992
[12] Brown D.P., Richards A: Managing access to data in a multi-temperature database, US Patent US9015146B2, 2015-04-21
[13] Marquardt A.: Generating Globally Unique Identifiers for Use with MongoDB, https://www.mongodb.com/blog/post/generating-globally-unique-identifiers-for-use-with-mongodb (Access: 9 September 2018)

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-cdbf6461-0192-4f6a-a833-c0e88d86808e