
Integrating Content and Structure Learning: A Model of Hypertext Zoning and Sounding
Title: Integrating Content and Structure Learning: A Model of Hypertext Zoning and Sounding
Authors: Alexander Mehler and Ulli Waltinger
Pub/Conf: Modeling, Learning and Processing of Text Technological Data Structures. Mehler A, Henning Lobin and Harald Lüngen and K-UK, Storrer A, Witt A (Eds); Studies in Computational Intelligence, Berlin/New York: Springer
Abstract:
The bag-of-words model is accepted as the first choice when it comes to representing the content of web documents. It benefits from a low time complexity, but this comes at the cost of ignoring document structure. Obviously, there is a trade-off between the range of document modeling and its computational complexity. In this chapter, we present a model of content and structure learning that tackles this trade-off with a focus on delimiting documents as instances of webgenres. We present and evaluate a two-level algorithm of hypertext zoning that integrates the genre-related classification of web documents with their segmentation. In addition, we present an algorithm of hypertext sounding with respect to the thematic demarcation of web documents.
BibTeX:
@incollection{DBLP:series/sci/MehlerW12, author = {Alexander Mehler and Ulli Waltinger}, title = {Integrating Content and Structure Learning: A Model of Hypertext Zoning and Sounding}, booktitle = {Modeling, Learning, and Processing of Text Technological Data Structures}, year = {2012}, pages = {299-329}, ee = {http://dx.doi.org/10.1007/978-3-642-22613-7_15}, crossref = {DBLP:series/sci/2012-370}, bibsource = {DBLP, http://dblp.uni-trier.de} }