Focused Crawling of the Deep Web Using Service Class Descriptions (open access)

Focused Crawling of the Deep Web Using Service Class Descriptions

Dynamic Web data sources--sometimes known collectively as the Deep Web--increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deep Web. To address these challenges, we present DynaBot, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DynaBot has three unique characteristics. First, DynaBot utilizes a service class model of the Web implemented through the construction of service class descriptions (SCDs). Second, DynaBot employs a modular, self-tuning system architecture for focused crawling of the DeepWeb using service class descriptions. Third, DynaBot incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis. Our experimental results demonstrate the effectiveness of the service class discovery, probing, and …
Date: June 21, 2004
Creator: Rocco, D.; Liu, L. & Critchlow, T.
Object Type: Article
System: The UNT Digital Library
The Alvin Sun (Alvin, Tex.), Vol. 114, No. 50, Ed. 1 Monday, June 21, 2004 (open access)

The Alvin Sun (Alvin, Tex.), Vol. 114, No. 50, Ed. 1 Monday, June 21, 2004

Weekly newspaper from Alvin, Texas that includes local, state, and national news along with advertising.
Date: June 21, 2004
Creator: Schwind, Jim & Looby, Edward
Object Type: Newspaper
System: The Portal to Texas History