Publication detail

Automatic annotation of online articles based on visual feature classification

BURGET, R. BURGETOVÁ, I.

Original Title

Automatic annotation of online articles based on visual feature classification

English Title

Automatic annotation of online articles based on visual feature classification

Type

journal article in Scopus

Language

en

Original Abstract

When applying the traditional data mining methods to World Wide Web documents, the typical problem is that a normal web page contains a variety of information of different kinds in addition to its main content. This additional information such as navigation, advertisement or copyright notices negatively influences the results of the data mining methods as for example the content classification. In this paper, we present a method of interesting area detection in a web page. This method is inspired by an assumed human reader approach to this task. First, basic visual blocks are detected in the page and subsequently, the purpose of these blocks is guessed based on their visual appearance. We describe a page segmentation method used for the visual block detection, we propose a way of the block classification based on the visual features and finally, we provide an experimental evaluation of the method on real-world data.

English abstract

When applying the traditional data mining methods to World Wide Web documents, the typical problem is that a normal web page contains a variety of information of different kinds in addition to its main content. This additional information such as navigation, advertisement or copyright notices negatively influences the results of the data mining methods as for example the content classification. In this paper, we present a method of interesting area detection in a web page. This method is inspired by an assumed human reader approach to this task. First, basic visual blocks are detected in the page and subsequently, the purpose of these blocks is guessed based on their visual appearance. We describe a page segmentation method used for the visual block detection, we propose a way of the block classification based on the visual features and finally, we provide an experimental evaluation of the method on real-world data.

Keywords

automatic annotation, online articles, page segmentation; document preprocessing, visual features, visual analysis, data mining, classification

RIV year

2011

Released

01.07.2011

Publisher

NEUVEDEN

Location

NEUVEDEN

Pages from

338

Pages to

360

Pages count

23

URL

Documents

BibTex


@article{BUT76405,
  author="Radek {Burget} and Ivana {Burgetová}",
  title="Automatic annotation of online articles based on visual feature classification",
  annote="When applying the traditional data mining methods to World Wide Web documents,
the typical problem is that a normal web page contains a variety of information
of different kinds in addition to its main content. This additional information
such as navigation, advertisement or copyright notices negatively influences the
results of the data mining methods as for example the content classification. In
this paper, we present a method of interesting area detection in a web page. This
method is inspired by an assumed human reader approach to this task. First, basic
visual blocks are detected in the page and subsequently, the purpose of these
blocks is guessed based on their visual appearance. We describe a page
segmentation method used for the visual block detection, we propose a way of the
block classification based on the visual features and finally, we provide an
experimental evaluation of the method on real-world data.",
  address="NEUVEDEN",
  chapter="76405",
  doi="10.1504/IJIIDS.2011.041322",
  edition="NEUVEDEN",
  howpublished="print",
  institution="NEUVEDEN",
  number="4",
  volume="5",
  year="2011",
  month="july",
  pages="338--360",
  publisher="NEUVEDEN",
  type="journal article in Scopus"
}