RECMGMT-L Archives

Records Management

RECMGMT-L@LISTSERV.IGGURU.US

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Elizabeth W Adkins <[log in to unmask]>
Reply To:
Records Management Program <[log in to unmask]>
Date:
Tue, 12 Apr 2011 10:56:42 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (113 lines)
Capturing websites is tricky, particularly if the website uses Flash 
technology, or if the information on the site is fed from a database.  It 
may be that the foundation's website contains static content that is not 
changed very often, which should help.  But then there is also the issue 
of the crawling technology that is deployed to conduct the captures; 
typically organizations have IT defenses against unauthorized crawling, 
and you will have to deal with those programs before you can capture 
everything in a crawl.

It so happens that the Society of American Archivists has just posted a 
case study from the University of Michigan regarding website capture:  
http://www2.archivists.org/sites/all/files/FinalCase13.pdf

The University of Michigan contracted with the University of California 
for their website capture services, but recently a couple of companies 
have surfaced that offer website capture services.  The two that I know of 
are both based out of the U.K.; I'm sure that there are others starting to 
surface in other parts of the world, including the U.S.  One company is 
called Cloud Testing Limited; here is their website: 
http://www.website-archive.com/.  Another is called Hanzo Archives; here 
is their website:  http://www.hanzoarchives.com/.

Nobody that I know of has completely overcome all the technological 
barriers to accurate website captures, but there is some good progress 
being made.

Elizabeth W. Adkins, CRM, CA

The opinions expressed above are my own, and do not reflect those of my 
employer.



From:
Christine Martin <[log in to unmask]>
To:
[log in to unmask]
Date:
04/12/2011 09:33 AM
Subject:
archving web sites



 

One of the organizations I work for (a private foundation in Chicago) is
trying to decide whether (and how) to digitally preserve (or "archive") 
its
web site. 

 

The foundation does not take money or handle financial transactions over 
its
web site.  The web site contains primarily publications, e.g., news
releases, annual reports, newsletters, and the like.

 

My question is:  What software or procedures do you use to preserve your
organization's web site as it changes over time?  Our web developer has
suggested that we use WGet (a web crawler) to capture our web site as it
appears to the public and then use SubVersion (another software 
product-used
for version control?) to catalog any aspects of the web site that have
changed.  In this way, we store only the base web site plus incremental
changes, as opposed to storing multiple copies of portions of the web site
that have not changed.

 

Have any of you done (or attempted) anything similar?  If so, I would love
to hear what you did and how it went.  This is fairly new territory to me,
and any words of advice, warning, or encouragement would be most welcome.

 

Thank you.

 

Sincerely,

 

Christine Martin

Contract records manager

Des Plaines, IL 

224-636-2457 (cellular)

 

 


List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance
To unsubscribe from this list, click the below link. If not already 
present, place UNSUBSCRIBE RECMGMT-L or UNSUB RECMGMT-L in the body of the 
message.
mailto:[log in to unmask]



List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance
To unsubscribe from this list, click the below link. If not already present, place UNSUBSCRIBE RECMGMT-L or UNSUB RECMGMT-L in the body of the message.
mailto:[log in to unmask]

ATOM RSS1 RSS2