RECMGMT-L Archives

Records Management

RECMGMT-L@LISTSERV.IGGURU.US

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Peter Kurilecz <[log in to unmask]>
Reply To:
Records Management Program <[log in to unmask]>
Date:
Thu, 27 Oct 2011 16:38:50 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (48 lines)
Open Source Tool Speeds Up Web Archive Scoping « The Signal: Digital
Preservation

The State Library of North Carolina <http://statelibrary.ncdcr.gov/>, in
collaboration with the North Carolina State
Archives<http://www.archives.ncdcr.gov/>,
has been archiving North Carolina state agency web
sites<http://webarchives.ncdcr.gov/>since 2005 and social media since
2009.  Since then, we have crawled over 82
million documents and archived 6 terabytes of data.

With each quarterly crawl we are capturing over 5,000 hosts, many of which
are out-of-scope. As a government agency it is our responsibility to ensure
that we are not archiving any sites that are inappropriate due to their
content, copyright status, or simply because they are not related to North
Carolina government. Managing the crawl budget is also a priority.
Performing a crawl analysis allows us to prevent out-of-scope hosts from
being crawled, and at the same time we often encounter new seeds that should
be actively captured. A seed is any URL that we want to capture in our
crawl.



http://bit.ly/w33eOj

Source:
http://blogs.loc.gov/digitalpreservation/2011/10/open-source-tool-speeds-up-web-archive-scoping/
See if people are clicking on this link: http://bit.ly/w33eOj+
Try the bitly.com sidebar to see who is talking about a page on the web:
http://bitly.com/pages/sidebar



-- 
Peter Kurilecz CRM CA
[log in to unmask]
Richmond, Va
http://twitter.com/RAINbyte
http://tech.groups.yahoo.com/group/RAINbyte/
http://paper.li/RAINbyte/rainbyte
Information not relevant for my reply has been deleted to reduce the
electronic footprint and to save the sanity of digest subscribers

List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance
To unsubscribe from this list, click the below link. If not already present, place UNSUBSCRIBE RECMGMT-L or UNSUB RECMGMT-L in the body of the message.
mailto:[log in to unmask]

ATOM RSS1 RSS2