Integrating Content and Structure Learning: A Model of Hypertext Zoning and Sounding

Integrating Content and Structure Learning: A Model of Hypertext Zoning and Sounding

Title: Integrating Content and Structure Learning: A Model of Hypertext Zoning and Sounding
Authors: Alexander Mehler and Ulli Waltinger
Pub/Conf:  Modeling, Learning and Processing of Text Technological Data Structures. Mehler A, Henning Lobin and Harald Lüngen and K-UK, Storrer A, Witt A (Eds); Studies in Computational Intelligence, Berlin/New York: Springer

Abstract:
The bag-of-words model is accepted as the first choice when it comes to representing the content of web documents. It benefits from a low time complexity, but this comes at the cost of ignoring document structure. Obviously, there is a trade-off between the range of document modeling and its computational complexity. In this chapter, we present a model of content and structure learning that tackles this trade-off with a focus on delimiting documents as instances of webgenres. We present and evaluate a two-level algorithm of hypertext zoning that integrates the genre-related classification of web documents with their segmentation. In addition, we present an algorithm of hypertext sounding with respect to the thematic demarcation of web documents.

BibTeX:

@incollection{DBLP:series/sci/MehlerW12,
  author    = {Alexander Mehler and
               Ulli Waltinger},
  title     = {Integrating Content and Structure Learning: A Model of Hypertext
               Zoning and Sounding},
  booktitle = {Modeling, Learning, and Processing of Text Technological
               Data Structures},
  year      = {2012},
  pages     = {299-329},
  ee        = {http://dx.doi.org/10.1007/978-3-642-22613-7_15},
  crossref  = {DBLP:series/sci/2012-370},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

PDFBibTeX