Community Web Archives - Alexandria Library

ALEXANDRIA COMMUNITY WEB ARCHIVES

Alexandria Library Local History / Special Collections is launching a new community web archiving program – the Alexandria Community Web Archives. As part of our ongoing mission to document the history and culture of Alexandria, we will now be capturing our community’s footprint on the world wide web!

WHAT IS WEB ARCHIVING?

Web archiving is the process of collecting portions of the internet, in order to preserve and provide access to these websites for future use. The goal of web archiving is to capture a “snapshot in time” of parts of the web and (as best as possible) to recreate the experience that a user would have had if they had visited those sites on the live web on the day that they were archived.

A web archive is a group of archived websites that are often organized by theme, event, subject area, or web address. Some examples of web archives you might have used before are the Internet Archive’s Wayback Machine, Library of Congress' Web Archives, or the DC Public Library’s People’s Web Archives.

HOW DO I USE THE WEB ARCHIVES?

You can browse all of our web archives collections at https://archive-it.org/home/alexlibraryva. There are tons of potential uses for web archives! Web archives can serve basic-information seeking needs just like using other archival materials for research or using the live web to reference information. Web archives can also be useful datasets for research about the ways that communities grow and evolve over time.

OUR PARTNERS

Alexandria Library has partnered with Community Webs to launch this new community web archiving program. Community Webs, an initiative of Archive-It and the Internet Archive, aims to advance the capacity for public libraries, community archives, and other cultural heritage organizations to collaborate with their communities to build archives of web-published primary sources documenting local history and underrepresented voices. Community Webs achieves this mission by providing resources for professional training, technology services, networking, and in support of scholarly research.

The project was launched in 2017 and since then, more than 150 public libraries and other cultural heritage organizations have joined. To learn more about Community Webs and their other partner institutions, check out their website at https://communitywebs.archive-it.org/.

Frequently Asked Questions

Frequently Asked Questions about the Alexandria Community Web Archives. This section will be regularly updated with common questions from the community. You can send any questions that you have to LHSC@alexlibraryva.org.

For Researchers:

1. What is web archiving?

Web archiving is the process of collecting portions of the internet, in order to preserve and provide access to these collections for future use. The goal of web archiving is to capture a “snapshot in time” of parts of the web and (as best as possible) to recreate the experience that a user would have had if they had visited those sites on the live web on the day that they were archived.
A web archive is a group of archived websites that are often organized by theme, event, subject area, or web address. Some examples of web archives you might have used before are the Internet Archive’s Wayback Machine, Library of Congress' Web Archives, or the DC Public Library’s People’s Web Archives.

2. Why are you doing this?

The simple answer is that web content is at risk. Most people would not disagree that the internet plays a significant role in our lives. Because of this, the internet has become a major site of documentation of life as it has been lived since the 1990s. But, a typical web page only lasts around 100 days before changing, moving, or disappearing completely. And content posted to social media changes just as rapidly.Our mission at the Local History / Special Collections branch is to document the history and culture of Alexandria. We want to capture a rich and diverse portrait of our community and we are very excited to be able to add web-published resources to furthering that goal.

3. How does the Library select websites to capture?

Websites will be selected for inclusion in the Alexandria Community Web Archives based on a number of criteria, including:
- Community value – web resources that document aspects of Alexandria’s diverse local communities.
- Research value – web resources that have evidentiary or research value and maintain an accurate representation of the original content.
- Institutional value – we will collect all web content that is created and published by the Alexandria Library, including our website, social media, and blog(s).
- Relevance to other Local History/Special Collections collections – web resources that relates to existing collection(s) at Local History / Special Collections and complements and/or fills in gaps of documentation among our existing materials.
- Temporality – web resources that are regularly updated.
- Ephemerality – web resources that document one-time or spontaneous events and/or are under risk of soon becoming inaccessible.
- Avoiding duplication – as much as possible, we will avoid capturing web content that duplicates resources found in other web archives.

4. How are the websites archived?

With the help of suggestions from the community, Local History / Special Collections staff will select websites based upon the selection criteria detailed in our website evaluation rubric: community value, research value, institutional value, relevance to other Local History/Special Collections collections, temporality, and ephemerality. We will then reach out to connect with the site owner. Using Archive-It web crawling software we will capture snapshots-in-time of selected web content. After adding some descriptive information, newly web archived material will be made publicly available through our web archives portal at https://archive-it.org/home/alexlibraryva.

5. What tools does the Library’s web archive use?

We use Archive-It, a web archiving service that is supported by Internet Archive. Through Archive-It we are able to utilize both the Archive-It Standard crawler and Brozzler. Archive-It’s Standard crawler incorporates two different technologies – the Heritrix web crawler and Umbra. Brozzler is Archive-It’s newest crawling technology built to improve the capture of dynamic and multimedia web content.

6. Can I suggest a website?

Yes! Submit your suggestions using our online form.

7. How do I view the web archives?

You can browse our web archives collections at https://archive-it.org/home/alexlibraryva.

For Website Owners:

1. Will you include my website?

If you would like to be included in the Alexandria Community Web Archives, you can suggest your website using our online form.

2. What does it mean to grant or deny permission to allow the Library to capture my site?

It is generally accepted that web archives fall within the boundaries of the fair use doctrine of copyright law – meaning that libraries, archives, and other cultural heritage institutions have the right to capture any web content that is publicly available and which does not require a subscription to access or is otherwise password protected.
Websites that are archived for inclusion in the Alexandria Community Web Archives will be preserved and made freely available to the public in perpetuity.
The copyright status of your site remains with you. US Copyright Law protects the rights of creators of published and unpublished original works. The copyright holder has exclusive rights to reproduce the work, prepare derivative works, distribute and/or sell copies of the original work, and perform or display the work.

3. What if I don’t want you to collect my website?

We strive for responsible and ethical stewardship of both the digital and physical collections in our care. Requests to remove websites from our public access platform are considered on a case-by-case basis, however, our general policy is to remove materials from public access if that is the site owner’s desire. If you would like to submit a take-down request, please email your request to LHSC@alexlibraryva.org.

4. How often and for how long will you collect my site?

That depends on the site! After the initial capture, websites may be crawled again at scheduled intervals – i.e. quarterly, semi-annually, or annually. The frequency of additional captures will depend upon a number of variables including how frequently the site is updated, the content of the site, the structure of the site, relevancy as an historical resource, as well as other factors.

5. What should I do if your crawler causes problems on my site?

The Library and its staff always try to crawl sites politely and minimize server impact. However, if you experience problems or have any questions please contact us at LHSC@alexlibraryva.org.

6. Will all of my site be harvested?

The Library does not collect password-protected content as part of our community web archiving program. However, we do attempt to collect as much of a website as possible in order to provide an accurate snapshot for future researchers. Because of this we generally bypass robots.txt exclusions. Please contact us if you have any questions about this policy.

7. Do we need to contact you if our URL changes?

Please do! We appreciate any updates that site owners would like to provide. You can email us anytime at LHSC@alexlibraryva.org.

Glossary

This glossary includes definitions of key concepts and technology as they relate to web archiving and the Alexandria Community Web Archives.

1. Collection: A group of web archives related by a common theme or subject matter.

2. Crawl or Capture: A web archiving operation that refers to the process of downloading all of the code, images, documents, and other files essential to reproduce a website in order to preserve the original form of the content. Web archiving “crawls” are conducted by a “crawler.”

3. Crawler: Software that explores the internet and collects data about its contents.

4. Heritrix: Internet Archive’s open-source, extensible, web-scale, and archival-quality web crawler software. Archive-It uses Heritrix and Umbra in its standard crawls.

5. Resource: Any document in the archives that is represented by a URL.

6. Robots.txt: Files that a site owner can add to their site to keep crawlers from accessing all or parts of the website.

7. Seed: An item in Archive-It with a unique ID number. The Seed URL tells the crawler where to go on the web during a crawl; seed URLs also act as the starting or entry point as well as the access point for content in the archive.

8. Umbra: A browser-based technology that Archive-It uses to navigate the web during the crawl process more closely to how human viewers would experience it. Archive-It uses Heritrix and Umbra in its standard crawls.

9. URL: URL stands for Uniform Resource Locator. This is the location of a resource on the internet; the web address that usually appears at the top of your browser.

Frequently Asked Questions

For Researchers
- What is web archiving?
  
  Web archiving is the process of collecting portions of the internet, in order to preserve and provide access to these collections for future use. The goal of web archiving is to capture a “snapshot in time” of parts of the web and (as best as possible) to recreate the experience that a user would have had if they had visited those sites on the live web on the day that they were archived.
  
  A web archive is a group of archived websites that are often organized by theme, event, subject area, or web address. Some examples of web archives you might have used before are the Internet Archive’s Wayback Machine, Library of Congress' Web Archives, or the DC Public Library’s People’s Web Archives.
- Why are you doing this?
  
  The simple answer is that web content is at risk. Most people would not disagree that the internet plays a significant role in our lives. Because of this, the internet has become a major site of documentation of life as it has been lived since the 1990s. But, a typical web page only lasts around 100 days before changing, moving, or disappearing completely. And content posted to social media changes just as rapidly.
  
  Our mission at the Local History / Special Collections branch is to document the history and culture of Alexandria. We want to capture a rich and diverse portrait of our community and we are very excited to be able to add web-published resources to furthering that goal.
- How does the Library select websites to capture?
  Websites will be selected for inclusion in the Alexandria Community Web Archives based on a number of criteria, including:
  - Community value – web resources that document aspects of Alexandria’s diverse local communities.
  - Research value – web resources that have evidentiary or research value and maintain an accurate representation of the original content.
  - Institutional value – we will collect all web content that is created and published by the Alexandria Library, including our website, social media, and blog(s).
  - Relevance to other Local History/Special Collections collections – web resources that relates to existing collection(s) at Local History / Special Collections and complements and/or fills in gaps of documentation among our existing materials.
  - Temporality – web resources that are regularly updated.
  - Ephemerality – web resources that document one-time or spontaneous events and/or are under risk of soon becoming inaccessible.
  - Avoiding duplication – as much as possible, we will avoid capturing web content that duplicates resources found in other web archives.
- How are the websites archived?
  
  With the help of suggestions from the community, Local History / Special Collections staff will select websites based upon the selection criteria detailed in our website evaluation rubric: community value, research value, institutional value, relevance to other Local History/Special Collections collections, temporality, and ephemerality. We will then reach out to connect with the site owner. Using Archive-It web crawling software we will capture snapshots-in-time of selected web content. After adding some descriptive information, newly web archived material will be made publicly available through our web archives portal at https://archive-it.org/home/alexlibraryva.
- What tools does the Library’s web archive use?
  
  We use Archive-It, a web archiving service that is supported by Internet Archive. Through Archive-It we are able to utilize both the Archive-It Standard crawler and Brozzler. Archive-It’s Standard crawler incorporates two different technologies – the Heritrix web crawler and Umbra. Brozzler is Archive-It’s newest crawling technology built to improve the capture of dynamic and multimedia web content.
- Can I suggest a website?
  
  Yes! Submit your suggestions using our online form.
- How do I view the web archives?
  
  You can browse our web archives collections at https://archive-it.org/home/alexlibraryva.
For Website Owners
- Will you include my website?
  
  If you would like to be included in the Alexandria Community Web Archives, you can suggest your website using our online form.
- What does it mean to grant or deny permission to allow the Library to capture my site?
  
  It is generally accepted that web archives fall within the boundaries of the fair use doctrine of copyright law – meaning that libraries, archives, and other cultural heritage institutions have the right to capture any web content that is publicly available and which does not require a subscription to access or is otherwise password protected.
  
  Websites that are archived for inclusion in the Alexandria Community Web Archives will be preserved and made freely available to the public in perpetuity.
  
  The copyright status of your site remains with you. US Copyright Law protects the rights of creators of published and unpublished original works. The copyright holder has exclusive rights to reproduce the work, prepare derivative works, distribute and/or sell copies of the original work, and perform or display the work.
- What if I don’t want you to collect my website?
  
  We strive for responsible and ethical stewardship of both the digital and physical collections in our care. Requests to remove websites from our public access platform are considered on a case-by-case basis, however, our general policy is to remove materials from public access if that is the site owner’s desire. If you would like to submit a take-down request, please email your request to LHSC@alexlibraryva.org.
- How often and for how long will you collect my site?
  
  That depends on the site! After the initial capture, websites may be crawled again at scheduled intervals – i.e. quarterly, semi-annually, or annually. The frequency of additional captures will depend upon a number of variables including how frequently the site is updated, the content of the site, the structure of the site, relevancy as an historical resource, as well as other factors.
- What should I do if your crawler causes problems on my site?
  
  The Library and its staff always try to crawl sites politely and minimize server impact. However, if you experience problems or have any questions please contact us at LHSC@alexlibraryva.org.
- Will all of my site be harvested?
  
  The Library does not collect password-protected content as part of our community web archiving program. However, we do attempt to collect as much of a website as possible in order to provide an accurate snapshot for future researchers. Because of this we generally bypass robots.txt exclusions. Please contact us if you have any questions about this policy.
- Do we need to contact you if our URL changes?
  
  Please do! We appreciate any updates that site owners would like to provide. You can email us anytime at LHSC@alexlibraryva.org.

Glossary

This glossary includes definitions of key concepts and technology as they relate to web archiving and the Alexandria Community Web Archives.

Web Archive Terminology
- Collection
  
  A group of web archives related by a common theme or subject matter.
- Crawl or Capture
  
  A web archiving operation that refers to the process of downloading all of the code, images, documents, and other files essential to reproduce a website in order to preserve the original form of the content. Web archiving “crawls” are conducted by a “crawler.”
- Crawler
  
  Software that explores the internet and collects data about its contents.
- Heritrix
  
  Internet Archive’s open-source, extensible, web-scale, and archival-quality web crawler software. Archive-It uses Heritrix and Umbra in its standard crawls.
- Resource
  
  Any document in the archives that is represented by a URL.
- Robots.txt
  
  Files that a site owner can add to their site to keep crawlers from accessing all or parts of the website.
- Seed
  
  An item in Archive-It with a unique ID number. The Seed URL tells the crawler where to go on the web during a crawl; seed URLs also act as the starting or entry point as well as the access point for content in the archive.
- Umbra
  
  A browser-based technology that Archive-It uses to navigate the web during the crawl process more closely to how human viewers would experience it. Archive-It uses Heritrix and Umbra in its standard crawls.
- URL
  
  URL stands for Uniform Resource Locator. This is the location of a resource on the internet; the web address that usually appears at the top of your browser.

Public Computers will be unavailable at the Law Library from Oct. 28-29. We apologize for the inconvenience.

NOTICE RE: NEW BOOKS: We are beginning orders of new books and materials. Bear with us, as it will take a few weeks for these orders to begin arriving. Thanks for your patience!

Trouble with your PIN or password after our recent catalog upgrade? Click this banner for account help information!

ALEXANDRIA COMMUNITY WEB ARCHIVES

WHAT IS WEB ARCHIVING?

HOW DO I USE THE WEB ARCHIVES?

OUR PARTNERS

Frequently Asked Questions

For Researchers:

For Website Owners:

Glossary

About US

Support Your LibrarY

CONTACT US

SEARCH WEBSITE

© 2018-2023. Alexandria Library | City of Alexandria, Virginia | Sitemap