2008 , Volume 13, Special issue, p.93-101

Hmelnov A.E., Shigarov A.O.

A method for tables extraction from a plain text

The problem of tables extraction is a part analysis of documents. Different approaches to this problem are usually based on certain media and formats. A heuristic method for a plain text table extraction from an unformatted and formatted documents is considered in this paper. This method uses some particular properties of the statistical tables, and it can also be applied to the tables of the similar structure. Additionally, the model of the table structure is proposed, which allows to transform automatically the contents of the extracted tables into relational tables.

Keywords: Document analysis and processing, information extraction, table extraction

Hmelnov Alexey Evgenievich
PhD. , Associate Professor
Position: Head of Laboratory
Office: Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences
Address: 664033, Russia, Irkutsk, 134 Lermontov str.
Phone Office: (3952) 45-30-71
SPIN-code: 8041-3667

Shigarov Alexei Olegovich
Position: Senior Research Scientist
Office: Institute for System Dynamics and Control Theory, Siberian Branch of RAS
Address: 664033, Russia, Irkutsk, 134 Lermontov str.
Phone Office: (3952) 45-31-02

Bibliography link:
Hmelnov A.E., Shigarov A.O. A method for tables extraction from a plain text // Computational technologies. 2008. V. 13. Special issue 1. P. 93-101
