UPDATES TO SERVICES AND INFORMATION ON THE LIBRARY'S DATA INCIDENT CAN BE FOUND HERE>>>
Alexandria Library Local History / Special Collections is launching a new community web archiving program – the Alexandria Community Web Archives. As part of our ongoing mission to document the history and culture of Alexandria, we will now be capturing our community’s footprint on the world wide web!
Web archiving is the process of collecting portions of the internet, in order to preserve and provide access to these websites for future use. The goal of web archiving is to capture a “snapshot in time” of parts of the web and (as best as possible) to recreate the experience that a user would have had if they had visited those sites on the live web on the day that they were archived.
A web archive is a group of archived websites that are often organized by theme, event, subject area, or web address. Some examples of web archives you might have used before are the Internet Archive’s Wayback Machine, Library of Congress' Web Archives, or the DC Public Library’s People’s Web Archives.
You can browse all of our web archives collections at https://archive-it.org/home/alexlibraryva. There are tons of potential uses for web archives! Web archives can serve basic-information seeking needs just like using other archival materials for research or using the live web to reference information. Web archives can also be useful datasets for research about the ways that communities grow and evolve over time.
Alexandria Library has partnered with Community Webs to launch this new community web archiving program. Community Webs, an initiative of Archive-It and the Internet Archive, aims to advance the capacity for public libraries, community archives, and other cultural heritage organizations to collaborate with their communities to build archives of web-published primary sources documenting local history and underrepresented voices. Community Webs achieves this mission by providing resources for professional training, technology services, networking, and in support of scholarly research.
The project was launched in 2017 and since then, more than 150 public libraries and other cultural heritage organizations have joined. To learn more about Community Webs and their other partner institutions, check out their website at https://communitywebs.archive-it.org/.
Frequently Asked Questions about the Alexandria Community Web Archives. This section will be regularly updated with common questions from the community. You can send any questions that you have to LHSC@alexlibraryva.org.
1. What is web archiving?
2. Why are you doing this?
3. How does the Library select websites to capture?
4. How are the websites archived?
5. What tools does the Library’s web archive use?
6. Can I suggest a website?
7. How do I view the web archives?
1. Will you include my website?
2. What does it mean to grant or deny permission to allow the Library to capture my site?
3. What if I don’t want you to collect my website?
4. How often and for how long will you collect my site?
5. What should I do if your crawler causes problems on my site?
6. Will all of my site be harvested?
7. Do we need to contact you if our URL changes?
This glossary includes definitions of key concepts and technology as they relate to web archiving and the Alexandria Community Web Archives.
1. Collection: A group of web archives related by a common theme or subject matter.
2. Crawl or Capture: A web archiving operation that refers to the process of downloading all of the code, images, documents, and other files essential to reproduce a website in order to preserve the original form of the content. Web archiving “crawls” are conducted by a “crawler.”
3. Crawler: Software that explores the internet and collects data about its contents.
4. Heritrix: Internet Archive’s open-source, extensible, web-scale, and archival-quality web crawler software. Archive-It uses Heritrix and Umbra in its standard crawls.
5. Resource: Any document in the archives that is represented by a URL.
6. Robots.txt: Files that a site owner can add to their site to keep crawlers from accessing all or parts of the website.
7. Seed: An item in Archive-It with a unique ID number. The Seed URL tells the crawler where to go on the web during a crawl; seed URLs also act as the starting or entry point as well as the access point for content in the archive.
8. Umbra: A browser-based technology that Archive-It uses to navigate the web during the crawl process more closely to how human viewers would experience it. Archive-It uses Heritrix and Umbra in its standard crawls.
9. URL: URL stands for Uniform Resource Locator. This is the location of a resource on the internet; the web address that usually appears at the top of your browser.
Frequently Asked Questions about the Alexandria Community Web Archives. This section will be regularly updated with common questions from the community. You can send any questions that you have to LHSC@alexlibraryva.org.
Web archiving is the process of collecting portions of the internet, in order to preserve and provide access to these collections for future use. The goal of web archiving is to capture a “snapshot in time” of parts of the web and (as best as possible) to recreate the experience that a user would have had if they had visited those sites on the live web on the day that they were archived.
A web archive is a group of archived websites that are often organized by theme, event, subject area, or web address. Some examples of web archives you might have used before are the Internet Archive’s Wayback Machine, Library of Congress' Web Archives, or the DC Public Library’s People’s Web Archives.
The simple answer is that web content is at risk. Most people would not disagree that the internet plays a significant role in our lives. Because of this, the internet has become a major site of documentation of life as it has been lived since the 1990s. But, a typical web page only lasts around 100 days before changing, moving, or disappearing completely. And content posted to social media changes just as rapidly.
Our mission at the Local History / Special Collections branch is to document the history and culture of Alexandria. We want to capture a rich and diverse portrait of our community and we are very excited to be able to add web-published resources to furthering that goal.
Websites will be selected for inclusion in the Alexandria Community Web Archives based on a number of criteria, including:
With the help of suggestions from the community, Local History / Special Collections staff will select websites based upon the selection criteria detailed in our website evaluation rubric: community value, research value, institutional value, relevance to other Local History/Special Collections collections, temporality, and ephemerality. We will then reach out to connect with the site owner. Using Archive-It web crawling software we will capture snapshots-in-time of selected web content. After adding some descriptive information, newly web archived material will be made publicly available through our web archives portal at https://archive-it.org/home/alexlibraryva.
We use Archive-It, a web archiving service that is supported by Internet Archive. Through Archive-It we are able to utilize both the Archive-It Standard crawler and Brozzler. Archive-It’s Standard crawler incorporates two different technologies – the Heritrix web crawler and Umbra. Brozzler is Archive-It’s newest crawling technology built to improve the capture of dynamic and multimedia web content.
Yes! Submit your suggestions using our online form.
You can browse our web archives collections at https://archive-it.org/home/alexlibraryva.
If you would like to be included in the Alexandria Community Web Archives, you can suggest your website using our online form.
It is generally accepted that web archives fall within the boundaries of the fair use doctrine of copyright law – meaning that libraries, archives, and other cultural heritage institutions have the right to capture any web content that is publicly available and which does not require a subscription to access or is otherwise password protected.
Websites that are archived for inclusion in the Alexandria Community Web Archives will be preserved and made freely available to the public in perpetuity.
The copyright status of your site remains with you. US Copyright Law protects the rights of creators of published and unpublished original works. The copyright holder has exclusive rights to reproduce the work, prepare derivative works, distribute and/or sell copies of the original work, and perform or display the work.
We strive for responsible and ethical stewardship of both the digital and physical collections in our care. Requests to remove websites from our public access platform are considered on a case-by-case basis, however, our general policy is to remove materials from public access if that is the site owner’s desire. If you would like to submit a take-down request, please email your request to LHSC@alexlibraryva.org.
That depends on the site! After the initial capture, websites may be crawled again at scheduled intervals – i.e. quarterly, semi-annually, or annually. The frequency of additional captures will depend upon a number of variables including how frequently the site is updated, the content of the site, the structure of the site, relevancy as an historical resource, as well as other factors.
The Library and its staff always try to crawl sites politely and minimize server impact. However, if you experience problems or have any questions please contact us at LHSC@alexlibraryva.org.
The Library does not collect password-protected content as part of our community web archiving program. However, we do attempt to collect as much of a website as possible in order to provide an accurate snapshot for future researchers. Because of this we generally bypass robots.txt exclusions. Please contact us if you have any questions about this policy.
Please do! We appreciate any updates that site owners would like to provide. You can email us anytime at LHSC@alexlibraryva.org.
This glossary includes definitions of key concepts and technology as they relate to web archiving and the Alexandria Community Web Archives.
A group of web archives related by a common theme or subject matter.
A web archiving operation that refers to the process of downloading all of the code, images, documents, and other files essential to reproduce a website in order to preserve the original form of the content. Web archiving “crawls” are conducted by a “crawler.”
Software that explores the internet and collects data about its contents.
Internet Archive’s open-source, extensible, web-scale, and archival-quality web crawler software. Archive-It uses Heritrix and Umbra in its standard crawls.
Any document in the archives that is represented by a URL.
Files that a site owner can add to their site to keep crawlers from accessing all or parts of the website.
An item in Archive-It with a unique ID number. The Seed URL tells the crawler where to go on the web during a crawl; seed URLs also act as the starting or entry point as well as the access point for content in the archive.
A browser-based technology that Archive-It uses to navigate the web during the crawl process more closely to how human viewers would experience it. Archive-It uses Heritrix and Umbra in its standard crawls.
URL stands for Uniform Resource Locator. This is the location of a resource on the internet; the web address that usually appears at the top of your browser.
Donate
Friends of the Library
Alexandria Library Foundation
Volunteer Opportunities