/header.html/header.htmlIndexing the Web
Indexes of information on the WWW are used by searching services. So
the two are inextricably connected. This page on indexing is barely
begun.
Other terms related to indexing: keywords catalog directory table-of-contents
taxonomy hierarchy classify organize.
There are many ways of building an index.
There are many kinds of indices.
Given an index, there are many ways to search through it.
Centralized Indexing/Searching
In a central index, everything that is searchable is indexed in one place.
This makes searching easier, but it is unscalable to the whole web.
Gathering the information for a centralized index is also a challenge.
Distributed Indexing/Searching
These systems distribute the index and searching across several
servers. The relationship between the servers is usually
hierarchical, but this is not always the case. Hierarchical
structures have troubles near the top because the top server either
has to contain all index information provided at lower levels or it
does not have perfect information to guide the forwarding of searches to
lower level alternative servers.
A non-hierarchical relationship between indexes is a challenge because
no index has all the information. Forwarding queries may rely on
knowing the relationship between indexes and knowing enough about the
queries to follow the best linked index.
Alternatively, the query may be forwarded to all linked
indexes and the query continues on the path that provides the best
match so far. This still requires each index to know at least some of
the information in the neighboring indexes so that a partial match may
succeed. But there does not have to be any formalized relationship between
indexes.
- X.500
- WHOIS++
- SOLO
- Z39.50
Resources and a copy.
- Harvest, a
system that provides a set of customizable tools for gathering
information from diverse repositories, building topic-specific content
indexes, flexibly searching the indexes, widely replicating them, and
caching objects as they are retrieved across the Internet.
Also see:
- Lost in
Hyperspace? Free Text Searches in the Web by Christian Neuss,
Stefanie Höfling
-
Publishing Information on the Internet with Anonymous FTP
describes how to make a range of
information available via anonymous FTP for use by automatic archive
indexing tools.
- ALIWEB
(Archie-Like Indexing for the Web)
-
Tom McArthur,
"Worlds of Reference: lexicography, learning and language from the
clay tablets to the computer"
Cambridge University Press, Cambridge,
Hardback 1986: ISBN 0.521 30637 X
Paperback 1988: ISBN 0 521 31403 8
- "Indexing Books" by Nancy Mulvany (University of
Chicago Press; ISBN 0-226-55014-1)
-
Resource Location, by Van Snyder - describes a recursive faceted
classification system.
-
Service Location Protocol by John Veizades
- An evolving science: taxonomical data modeling: applying biology
to data processing. (DBA Shoptalk) (Column), Database Programming &
Design, July 1993 v6 n7 p25
- Genetic algorithms and database indexing: finding the best set of
indexes. (data base schema) (Tutorial)
Dr. Dobb's Journal, April 1993 v18 n4 p30
- Internet resource discovery services. (Technical)
Computer, Sept 1993 v26 n9 p8
Author: Katia Obraczka, Peter B. Danzig, Shih-Hao Li
- Indexmaker
is a perl script whose function is to produce an index for a virtual
document consisting of a number of HTML files in a single directory.
- ffwindex is
part of the FFW
(Freetext search For Web) package.
-
htmlgobble copies pages to local files.
-
Iconovex AnchorPage
uses semantic and syntactical analysis to "read" HTML documnts and
extract the significant concepts from them.
-
SWISH
(Simple Web Indexing System for Humans)
Daniel LaLiberte
(liberte@ncsa.uiuc.edu)
Last modified: Fri May 31 11:07:49 CDT 1996
- 8
POWERKLASH.com- philly trio
by rickes335@yaoo.com, 2003, Jul 11
|
Add |
to: "Indexing the Web"
|