Extraction of scores and average from Algerian high-school degree transcripts

Kefali, Abderrahmane; Drabsia, Soumia; Sari, Toufik; Chaoui, Mohammed; Ferkous, Chokri

doi:10.7494/csci.2020.21.1.3400

Artykuł - szczegóły

Tytuł artykułu

Extraction of scores and average from Algerian high-school degree transcripts

Autorzy

Kefali Abderrahmane , Drabsia Soumia , Sari Toufik , Chaoui Mohammed , Ferkous Chokri

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2020.21.1.3400

Warianty tytułu

Języki publikacji

Abstrakty

A system for extracting scores and the average from Algerian high school degree transcripts is proposed. The system extracts the scores and average based on the localization of tables gathering this information; it consists of several stages. After preprocessing, the system locates the tables using ruling-line information as well as other text information. Therefore, the adopted localization approach can work even in the absence of certain ruling lines or the erasure and discontinuity of the lines. After this, the localized tables are segmented into columns and the columns into information cells. Finally, cell labeling is done based on prior knowledge of the table structure, allowing us to identify the scores and the average. Experiments have been conducted on a local dataset in order to evaluate the performances of our system and compare it to three public systems at three levels; the obtained results show the effectiveness of our system.

Słowa kluczowe

localization of areas of interest table localization and recognition table understanding document analysis and recognition digital archiving physical and logical structure

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2020

Tom

T. 21 (1)

Strony

59--96

Opis fizyczny

Bibliogr. 47 poz., rys., tab.

Twórcy

autor

Kefali Abderrahmane

kefali.abderrahmane@univ-guelma.dz

Université 8 Mai 1945 Guelma, Département d’Informatique, BP 401, Guelma 24000, Algeria
Badji Mokhtar Annaba University, LabGED Laboratory, B.P. 12, Annaba 23000, Algeria

autor

Drabsia Soumia

miyasoumia5@gmail.com

Université 8 Mai 1945 Guelma, Département d’Informatique, BP 401, Guelma 24000, Algeria

autor

Sari Toufik

sari@labged.net

Badji Mokhtar Annaba University, LabGED Laboratory, B.P. 12, Annaba 23000, Algeria
Badji Mokhtar Annaba University, Computer Science Department, B. P. 12, Annaba 23000, Algeria

autor

Chaoui Mohammed

chaoui.mohammed@univ-guelma.dz

Université 8 Mai 1945 Guelma, Département d’Informatique, BP 401, Guelma 24000, Algeria

autor

Ferkous Chokri

ferkous.chokri@univ-guelma.dz

Université 8 Mai 1945 Guelma, LabSTIC, Departtement d'Informatique, BP 401, Guelma 24000, Algeria

Bibliografia

[1] Cesarini F., Marinai S., Sarti L., Soda G.: Trainable table location in document images. In: 16th International Conference on Pattern Recognition, vol. 3, pp. 236–240, 2002.
[2] Chen J., Lopresti D.: Table detection in noisy offline handwritten documents. In: International Conference on Document Analysis and Recognition, pp. 399–403, 2011.
[3] Couasnon B., Lemaitre A.: Recognition of tables and forms. In: Doermann D., Tombre K. (eds.), Handbook of Document Image Processing and Recognition, pp. 647–677, Springer, London, 2014.
[4] Embley D.W., Hurst M., Lopresti D., Nagy G.: Table-processing paradigms: a research survey, International Journal on Document Analysis and Recognition, vol. 8(2–3), pp. 66–86, 2006.
[5] Embley D.W., Tao C., Liddle S.W.: Automating the extraction of data from HTML tables with unknown structure, Data & Knowledge Engineering, vol. 54(1), pp. 3–28, 2005.
[6] Gatos B., Danatsas D., Pratikakis I., Perantonis S.J.: Automatic table detection in document images. In: International Conference on Advances in Pattern Recognition and Image Analysis, ICAPR 2005, pp. 609–618, 2005.
[7] Gobel M., Hassan T., Oro E., Orsi G.: A methodology for evaluating algorithms for table understanding in PDF documents. In: ACM Symposium on Document Engineering, pp. 45–48, 2012.
[8] Gobel M., Hassan T., Oro E., Orsi G.: ICDAR 2013 table competition. In: 12th International Conference on Document Analysis and Recognition, pp. 1449–1453, 2013.
[9] Gilani A., Qasim S.R., Malik I., Shafait F.: Table Detection using Deep Learning. In: 14th IAPR International Conference on Document Analysis and Recognition, vol. 1, pp. 771–776, 2017.
[10] Green E.A., Krishnamoorthy M.S.: Model-Based Analysis of Printed Tables. In: International Conference on Document Analysis and Recognition, pp. 214–217, 1995.
[11] Handley J.C.: Document recognition. In: E.R. Doughert, (ed.) Electronic Imaging Technology, chapter 8, pp. 289–316. SPIE-The International Society for Optical Engineering, 1999.
[12] Harit G., Bansal A.: Table detection in document images using header and trailer patterns. In: 8th Indian Conference on Computer Vision, Graphics and Image Processing, p. 62, 2012.
[13] Hori O., Doermann D.S.: Robust table-form structure analysis based on boxdriven reasoning. In: International Conference on Document Analysis and Recognition, pp. 218–221, 1995.
[14] Hu J., Kashi R.S., Lopresti D.P., Wilfong G.: Medium-independent table detection. In: Lopresti D.P., Zhou J. (eds.), Proceedings of Document Recognition and Retrieval VII, vol. 3967, pp. 291–302, International Society for Optics and Photonics, SPIE, 2000.
[15] Hurst M.: The Interpretation of Tables in Texts. PhD thesis, University of Edinburgh, 2000.
[16] Huynh-Van T., Nguyen-An K., Khanh T.L.B., Yang H.J., Tran T.A., Kim S.H.: Learning to detect tables in document images using line and text information. In: 2nd International Conference on Machine Learning and Soft Computing, pp. 151–155, 2018.
[17] Kasar T., Barlas P., Adam S., Chatelain C., Paquet T.: Learning to detect tables in scanned document images using line information. In: International Conference on Document Analysis and Recognition, pp. 1185–1189, 2013.
[18] Kasar T., Bhowmik T.K., Bela¨ıd A.: Table information extraction and structure recognition using query patterns. In: 13th International Conference on Document Analysis and Recognition, pp. 1086–1090, 2015.
[19] Khurshid K., Siddiqi I., Faure C., Vincent N.: Comparison of Niblack inspired Binarization methods for ancient documents. In: SPIE 7247, Document Recognition and Retrieval XVI, pp. 267–275. San Jose, California, United States, 2009.
[20] Laurentini A., Viada P.: Identifying and understanding tabular material in compound documents. In: 11th IAPR International Conference on Pattern Recognition, vol. II. Conference B: Pattern Recognition Methodology and Systems, pp. 405–409, 1992.
[21] Liang J.: Document Structure Analysis and Performance Evaluation. Phd thesis, University of Washington, Seattle, 1999.
[22] Lopresti D., Nagy G.: A tabular survey of automated table processing. In: A.K. Chhabra, D. Dori (eds.), International Workshop on Graphics Recognition, Lecture Notes in Computer Science, vol. 1941, pp. 93–120. Springer, Berlin, Heidelberg, 1999.
[23] Mahmoud A.S.: Arabic Character Recognition Using Fourier Descriptors and Character Contour Encoding, Pattern Recognition, vol. 27(6), pp. 815–824, 1994.
[24] Mandal S., Chowdhury S.P., Das A.K., Chanda B.: A simple and effective table detection system from document images, International Journal on Document Analysis and Recognition, vol. 8(2), pp. 172–182, 2006.
[25] Niblack W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed, Denmark, 1985.
[26] Otsu N.: A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics, vol. 9(1), pp. 62–66, 1979.
[27] Pinto D., McCallum A., Wei X., Croft W.B.: Table extraction using conditional random fields. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 235–242, 2003.
[28] Rahgozar M.A., Cooperman R.: A graph-based table recognition system. In: L.M. Vincent, J.H. Jonathan (eds.), Document Recognition III, vol. 2660, pp. 192–203, International Society for Optics and Photonics, SPIE, 1996.
[29] Ramel J.Y., Crucianu M., Vincent N., Faure C.: Detection, extraction and representation of tables. In: 7th International Conference on Document Analysis and Recognition, pp. 374–378, 2003.
[30] Santosh K.C.: g-DICE: Graph mining based document information content exploitation, International Journal on Document Analysis and Recognition, vol. 18(4), pp. 337–355, 2015.
[31] Sari T., Kefali A., Bahi H.: Text extraction from historical document images by the combination of several thresholding techniques, Advances in Multimedia, vol. 2014, p. 11, 2014.
[32] Sauvola J., Pietikainen M.: Adaptive document image binarization, Pattern Recognition, vol. 33(2), pp. 225–236, 2000.
[33] Seo W., Koo H.I., Cho N.I.: Junction-based table detection in camera-captured document images, International Journal on Document Analysis and Recognition, vol. 18(1), pp. 47–57, 2015.
[34] Shafait F., Smith R.: Table Detection in Heterogeneous Documents. In: 9th IAPR International Workshop on Document Analysis Systems, pp. 65–72, 2010.
[35] Shahab A., Shafait F., Kieninger T., Dengel A.: An open aproach towards the benchmarking of table structure recognition systems. In: 9th IAPR International Workshop on Document Analysis Systems, pp. 113–120, 2010.
[36] e Silva A.C., Jorge A.M., Torgo L.: Design of an end-to-end method to extract information from tables, International Journal on Document Analysis and Recognition, vol. 8(2–3), pp. 144–171, 2006.
[37] Tapsoba L.: La contribution des projets de gestion electronique des documents (GED) a la performance organisationnelle de Ouagadougou (CAO). PhD thesis, University Aube Nouvelle, Switzerland, 2017.
[38] Tran D.N., Tran T.A., Oh A., Kim S.H., Na I.S.: Table detection from document image using vertical arrangement of text blocks, International Journal of Contents, vol. 11(4), pp. 77–85, 2015.
[39] Tran T.A., Tran H.T., Na I.S., Lee G.S., Yang H.J., Kim S.H.: A mixture model using Random Rotation Bounding Box to detect table region in document image, Journal of Visual Communication and Image Representation, vol. 39, pp. 196–208, 2016.
[40] Wang Y., Haralick R., Phillips I.T.: Automatic table ground truth generation and a background-analysis-based table structure extraction method. In: International Conference on Document Analysis and Recognition, pp. 528–532, 2001.
[41] Wang Y., Phillips I.T., Haralick R.M.: Table detection via probability optimization. In: D. Lopresti, J. Hu, R. Kashi (eds.) International Workshop on Document Analysis Systems, Lecture Notes in Computer Science, vol. 2423, pp. 272–282, Springer, Berlin, Heidelberg, 2002.
[42] Wang Y., Phillips I.T., Haralick R.M.: Table structure understanding and its performance evaluation, Pattern Recognition, vol. 37(7), pp. 1479–1497, 2004.
[43] Watanabe T., Naruse H., Luo Q., Sugie N.: Structure analysis of table-form document on the basis of the recognition of vertical and horizontal line segments. In: 1st International Conference on Document Analysis and Recognition, pp. 638–646, 1991.
[44] Watanabe T., Luo Q., Sugie N.: Towards a practical document understanding of table-form documents: its framework and knowledge representation. In: 2nd International Conference on Document Analysis and Recognition, pp. 510–515, 1993.
[45] Watanabe T., Luo Q., Sugie N.: Layout recognition of multi-kinds of table-form documents, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17(4), pp. 432–445, 1995.
[46] Zanibbi R., Blostein D., Cordy J.R.: A survey of table recognition: Models, observations, transformations, and inferences, International Journal on Document Analysis and Recognition, vol. 7(1), pp. 1–16, 2004.
[47] Zhouchen L., He J., Zhong Z., Wang R., Shum H.Y.: Table detection in online ink notes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28(8), pp. 1341–1346, 2006.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-89d9ef9d-220b-41ff-bea8-3cd57a986e36