Home | Sources Directory | News Releases | Calendar | Articles | | Contact |  

Archive site

In web archiving, an archive site is a website that stores information on, or the actual, webpages from the past for anyone to view.

Contents

[edit] Common techniques

Two common techniques are #1 using a web crawler or #2 user submissions.

  1. By using a web crawler the service will not depend on an active community for their content, thereby building a larger database faster, which usually results in the community growing larger as well. However, web site developers and system administrators do have the ability to block these robots from accessing [certain] web pages (using a robots.txt).
  2. While it can be difficult to start such services due to potentially low rates of user submission, this system can yield some of the best results. By crawling web pages one is only able to obtain the information the public has bothered to post to the Internet. They may have not bothered to post it due to not thinking anyone would be interested in it, lack of a proper medium, etc. However, if they see someone wants their information then they may be more apt to submit it.

[edit] Examples

[edit] Google Groups

On February 12, 2001, Google acquired the Usenet discussion group archives from Deja.com and turned it into their Google Groups service [1]. They allow users to search old discussions with Google's search technology, while still allowing users to post to the mailing lists.

[edit] Internet Archive

The Internet Archive (official website) is building a compendium of websites and digital media. Starting in 1996, Archive has been employing a web crawler to build up their database. They are one of the best known archive sites.

[edit] TextFiles.com

TextFiles.com is a large library of old text files maintained by Jason Scott Sadofsky. Its mission is to archive the old documents that had floated around the bulletin board systems (BBS) of his youth and to document other people's experiences on the BBSes.

[edit] PANDORA Archive

PANDORA (Pandora Archive), founded in 1996 by the National Library of Australia, stands for Preserving and Accessing Networked Documentary Resources of Australia, which encapsulates their mission. They provide a long-term catalog of select online publications and web sites authored by Australians or that are of an Australian topic. They employ their PANDAS (PANDORA Digital Archiving System) when building their catalog.

[edit] Nextpoint

Nextpoint offers an automated cloud-based, SaaS for marketing, compliance and litigation related needs including electronic discovery.

[edit] See also



Related Articles & Resources

Sources Subject Index - Experts, Sources, Spokespersons

Sources Select Resources Articles







This article is based on one or more articles in Wikipedia, with modifications and additional content by SOURCES editors. This article is covered by a Creative Commons Attribution-Sharealike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL). The remainder of the content of this website, except where otherwise indicated, is copyright SOURCES and may not be reproduced without written permission. (For information use the Contact form.)

SOURCES.COM is an online portal and directory for journalists, news media, researchers and anyone seeking experts, spokespersons, and reliable information resources. Use SOURCES.COM to find experts, media contacts, news releases, background information, scientists, officials, speakers, newsmakers, spokespeople, talk show guests, story ideas, research studies, databases, universities, associations and NGOs, businesses, government spokespeople. Indexing and search applications by Ulli Diemer and Chris DeFreitas.

For information about being included in SOURCES as a expert or spokesperson see the FAQ or use the online membership form. Check here for information about becoming an affiliate. For partnerships, content and applications, and domain name opportunities contact us.


Sources home page