A Document Object Model for Solving the Problem of Identification and Structurization of Documentary Flows of Rosfinmonitoring

Abstract

The paper considers the issues of building an information retrieval system by using the algorithm of automated classification and recognition of the structure of fulltext documents. It describes the selected approaches, as well as the algorithm for identifying the document type and the algorithm for recognizing its logical structure, developed on the basis of these approaches, with the aim of further semantic processing. It introduces a multi stage method for automated recognition and formation of a model of the logical structure of a document. Experimental studies of this method have been conducted on the array of reporting documents “Rosfinmonitoring”.

References
[1] Sysoykina MA Modeling and development of tools and technologies for presenting information in distributed electronic libraries: author’s abstract. dis. for the degree of Candidate of Technical Sciences: 05.25.05 / М.А. Sysoykina; Moscow, Russian State University for the Humanities - M., 2003. - 28 s.


[2] Debashish Niyogi D. and Srihari S. The use of document structure analysis to retrieve information from documents in digital libraries [Электронный ресурс].// URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1. 1.90.9296&rep=rep1&type=pdf (дата обращения: 05.02.2016)


[3] Kharlamov A. Automatic structural analysis of texts // Open systems. - 2002. - №10. - p.7-8


[4] Mao S., Rosenfeld A., Kanungo T. Document structure analysis algorithms: A literature survey [Электронный ресурс]. // ResearchGate [Cайт]. URL: http://www.researchgate.net/publication/221253919_Document_structure_ analysis_algorithms_a_literature_survey (дата обращения: 07.03.2016)


[5] Hirokazu I., Shimazu A., Ochimru K. Document Structure Analysis with Syntactic Model and Parsers: Application to Legal Judgments [Электронный ресурс]. // Springer Link [Cайт]. URL: http://link.springer.com/chapter/10.1007%2F978-3- 642-32090- 3_12 (дата обращения: 15.02.2016)


[6] Dengel A. Initial Learning of Document Structure [Электронный ресурс]. // Old Dominion University [Офиц.сайт]. URL: http://www.cs.odu.edu/∼pflynn/survey/doc-struct- 00395776.pdf (дата обращения: 15.02.2016)


[7] Klampfl S., Granitzer M., Jack K., Kern R. Unsupervised document structure analysis of digital scientific articles [Электронный ресурс]. // Springer Link [Cайт]. URL:http://link.springer.com/article/10.1007%2Fs00799-014- 0115- 1#page-1 (дата обращения: 15.02.2016)


[8] U. Yu. A., Beginning of the general theory of systems. // System analysis and scientific knowledge, Moscow, 1978.


[9] MN PI I. Golitsyna OL, Information Systems, Moscow: FORUM, 2009.