Working with Web Archives


Ed Summers
mith.umd.edu
edsu@umd.edu



slides: http://bit.ly/webarchives-intro

How much of the web is in the Internet Archive?

273,000,000,000 1 / 1,000,000,000,000 2 = .273 ???


1. Alpert, J. and Hajaj, N. (2008). We knew the web was big... Google.
2. Goel, V. (2016). Defining Web pages, Web sites and Web captures. Internet Archive.

Internet Archive: Exercise (15 mins)

  1. Look up a URL at wayback.archive.org. What do you notice when examining what is in (or not in) the archive?
  2. Try to add a page to the Internet Archive using the Save Page Now function.
  3. (Optional) Install the Wayback Machine Google Chrome plugin and use it to look up a webpage or archive a webpage.

Webrecorder: Exercise (15 mins)

  1. Create a Webrecorder account
  2. Create a collection and add web content to it.
  3. Download the Webrecorder Player.
  4. Download your collection and view it in the player.
  5. (Optional) See if you can open your WARC file and look at what it contains.

Hypothesis: Exercise (15 mins)

  1. Create a Hypothesis account.
  2. Install the Chrome plugin or Bookmarklet.
  3. Annotate the Palantir/New Orleans article.
  4. Find the URL for sharing your annotation.
  5. (Optional) Comment on someone else's annotation.

Thanks!

Get in touch if you have questions.

edsu@umd.edu / @edsu