Capturing websites is tricky, particularly if the website uses Flash
technology, or if the information on the site is fed from a database. It
may be that the foundation's website contains static content that is not
changed very often, which should help. But then there is also the issue
of the crawling technology that is deployed to conduct the captures;
typically organizations have IT defenses against unauthorized crawling,
and you will have to deal with those programs before you can capture
everything in a crawl.
It so happens that the Society of American Archivists has just posted a
case study from the University of Michigan regarding website capture:
http://www2.archivists.org/sites/all/files/FinalCase13.pdf
The University of Michigan contracted with the University of California
for their website capture services, but recently a couple of companies
have surfaced that offer website capture services. The two that I know of
are both based out of the U.K.; I'm sure that there are others starting to
surface in other parts of the world, including the U.S. One company is
called Cloud Testing Limited; here is their website:
http://www.website-archive.com/. Another is called Hanzo Archives; here
is their website: http://www.hanzoarchives.com/.
Nobody that I know of has completely overcome all the technological
barriers to accurate website captures, but there is some good progress
being made.
Elizabeth W. Adkins, CRM, CA
The opinions expressed above are my own, and do not reflect those of my
employer.
From:
Christine Martin <[log in to unmask]>
To:
[log in to unmask]
Date:
04/12/2011 09:33 AM
Subject:
archving web sites
One of the organizations I work for (a private foundation in Chicago) is
trying to decide whether (and how) to digitally preserve (or "archive")
its
web site.
The foundation does not take money or handle financial transactions over
its
web site. The web site contains primarily publications, e.g., news
releases, annual reports, newsletters, and the like.
My question is: What software or procedures do you use to preserve your
organization's web site as it changes over time? Our web developer has
suggested that we use WGet (a web crawler) to capture our web site as it
appears to the public and then use SubVersion (another software
product-used
for version control?) to catalog any aspects of the web site that have
changed. In this way, we store only the base web site plus incremental
changes, as opposed to storing multiple copies of portions of the web site
that have not changed.
Have any of you done (or attempted) anything similar? If so, I would love
to hear what you did and how it went. This is fairly new territory to me,
and any words of advice, warning, or encouragement would be most welcome.
Thank you.
Sincerely,
Christine Martin
Contract records manager
Des Plaines, IL
224-636-2457 (cellular)
List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance
To unsubscribe from this list, click the below link. If not already
present, place UNSUBSCRIBE RECMGMT-L or UNSUB RECMGMT-L in the body of the
message.
mailto:[log in to unmask]
List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance
To unsubscribe from this list, click the below link. If not already present, place UNSUBSCRIBE RECMGMT-L or UNSUB RECMGMT-L in the body of the message.
mailto:[log in to unmask]
|