Publication detail

Analyzing Logical Structure of a Web Site

BURGET, R.

Original Title

Analyzing Logical Structure of a Web Site

English Title

Analyzing Logical Structure of a Web Site

Type

conference paper

Language

en

Original Abstract

The today's World Wide Web consists mainly of documents written in Hypertext Markup Language (HTML). This language has been developed for describing the look of the documents and the references to other documents and therefore it has very poor facilities for describing the semantics and the structure of the contained data. Moreover, some of these facilities are often not used by the authors of the documents or they are not used in apropriate way. In our work, we are attempting to analyze the look and the stucture of a Web site represented by the facilities of the HTML language and create its logical model which would represent the data relations the same way a human user would see it. We propose a tree representation of a Web site and algorithms for the analysis of the most importatnt HTML constructions - section headings, lists, tables and links.

English abstract

The today's World Wide Web consists mainly of documents written in Hypertext Markup Language (HTML). This language has been developed for describing the look of the documents and the references to other documents and therefore it has very poor facilities for describing the semantics and the structure of the contained data. Moreover, some of these facilities are often not used by the authors of the documents or they are not used in apropriate way. In our work, we are attempting to analyze the look and the stucture of a Web site represented by the facilities of the HTML language and create its logical model which would represent the data relations the same way a human user would see it. We propose a tree representation of a Web site and algorithms for the analysis of the most importatnt HTML constructions - section headings, lists, tables and links.

Keywords

HTML analysis, Semi-structured data, Information extraction

RIV year

2002

Released

04.04.2002

Location

Ostrava

ISBN

80-85988-70-4

Book

Proceedings of 5th International Conference ISM '02 - Information Systems Modelling

Pages from

29

Pages to

35

Pages count

7

URL

Documents

BibTex


@inproceedings{BUT10013,
  author="Radek {Burget}",
  title="Analyzing Logical Structure of a Web Site",
  annote="The today's World Wide Web consists mainly of documents written in
Hypertext Markup Language (HTML). This language has been developed for describing the look of the documents and the references to other
documents and therefore it has very poor facilities for describing the
semantics and the structure of the contained data. Moreover, some of
these facilities are often not used by the authors of the documents or
they are not used in apropriate way. In our work, we are attempting to
analyze the look and the stucture of a Web site represented by the
facilities of the HTML language and create its logical model which would
represent the data relations the same way a human user would see it.
We propose a tree representation of a Web site and algorithms for the
analysis of the most importatnt HTML constructions - section headings, lists, tables and links.",
  booktitle="Proceedings of 5th International Conference ISM '02 - Information Systems Modelling",
  chapter="10013",
  year="2002",
  month="april",
  pages="29--35",
  type="conference paper"
}