A Document Object Model for Solving the Problem of Identification and Structurization of Documentary Flows of Rosfinmonitoring


The paper considers the issues of building an information retrieval system by using the algorithm of automated classification and recognition of the structure of fulltext documents. It describes the selected approaches, as well as the algorithm for identifying the document type and the algorithm for recognizing its logical structure, developed on the basis of these approaches, with the aim of further semantic processing. It introduces a multi stage method for automated recognition and formation of a model of the logical structure of a document. Experimental studies of this method have been conducted on the array of reporting documents “Rosfinmonitoring”.

