Publication detail

Hierarchies in HTML Documents: Linking Text to Concepts

BURGET, R.

Original Title

Hierarchies in HTML Documents: Linking Text to Concepts

English Title

Hierarchies in HTML Documents: Linking Text to Concepts

Type

conference paper

Language

en

Original Abstract

For the successful setting of the Semantic Web, it is necessary to provide tools for linking the large amounts of data that are currently available in HTML documents to the Semantic Web ontologies. Due to the enormous variability of the HTML code, it is very limiting to define direct bindings between patterns of the HTML code and the concepts. We propose an approach based on modeling the visual part of the rendered document and describing the key characteristics of the data presentation in a general way. As a next step, we propose the way for using this model for locating the instances of the concepts in the document using the approximate tree matching algorithms and regular expressions.

English abstract

For the successful setting of the Semantic Web, it is necessary to provide tools for linking the large amounts of data that are currently available in HTML documents to the Semantic Web ontologies. Due to the enormous variability of the HTML code, it is very limiting to define direct bindings between patterns of the HTML code and the concepts. We propose an approach based on modeling the visual part of the rendered document and describing the key characteristics of the data presentation in a general way. As a next step, we propose the way for using this model for locating the instances of the concepts in the document using the approximate tree matching algorithms and regular expressions.

Keywords

HTML, Information extraction, Ontology, Logical document structure

RIV year

2004

Released

30.08.2004

Publisher

IEEE Computer Society

Location

Zaragoza

ISBN

0-7695-2195-9

Book

15th International Workshop on Database and Expert Systems Applications

Pages from

186

Pages to

190

Pages count

5

Documents

BibTex


@inproceedings{BUT17352,
  author="Radek {Burget}",
  title="Hierarchies in HTML Documents: Linking Text to Concepts",
  annote="For the successful setting of the Semantic Web, it is necessary to provide tools for linking the large amounts of data that are currently available in HTML documents to the Semantic Web ontologies. Due to the enormous variability of the HTML code, it is very limiting to define direct bindings between patterns of the HTML code and the concepts. We propose an approach based on modeling the visual part of the rendered document and describing the key characteristics of the data presentation in a general way. As a next step, we propose the way for using this model for locating the instances of the concepts in the document using the approximate tree matching algorithms and regular expressions.",
  address="IEEE Computer Society",
  booktitle="15th International Workshop on Database and Expert Systems Applications",
  chapter="17352",
  institution="IEEE Computer Society",
  year="2004",
  month="august",
  pages="186--190",
  publisher="IEEE Computer Society",
  type="conference paper"
}