Detail publikace

HTML Document Analysis for Information Extraction

Originální název

HTML Document Analysis for Information Extraction

Anglický název

HTML Document Analysis for Information Extraction

Jazyk

en

Originální abstrakt

The today's World Wide Web contains a vast amount of information stored in HTML documents. However, the HTML language primarily describes the look of the documents and it doesn't contain facilities for the description of contained data structure. In this paper we propose a model of a Web site that describes logical structure of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

Anglický abstrakt

The today's World Wide Web contains a vast amount of information stored in HTML documents. However, the HTML language primarily describes the look of the documents and it doesn't contain facilities for the description of contained data structure. In this paper we propose a model of a Web site that describes logical structure of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

BibTex


@inproceedings{BUT10014,
  author="Radek {Burget}",
  title="HTML Document Analysis for Information Extraction",
  annote="The today's World Wide Web contains a vast amount of
information stored in HTML documents. However, the HTML language
primarily describes the look of the documents and it doesn't contain
facilities for the description of contained data structure. In this
paper we propose a model of a Web site that describes logical structure
of contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.",
  address="Faculty of Information Technology BUT",
  booktitle="Proceedings of 8th EEICT conference",
  chapter="10014",
  institution="Faculty of Information Technology BUT",
  year="2002",
  month="april",
  pages="426--430",
  publisher="Faculty of Information Technology BUT",
  type="conference paper"
}