RECMGMT-L Archives

Records Management

RECMGMT-L@LISTSERV.IGGURU.US

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Larry Medina <[log in to unmask]>
Reply To:
Records Management Program <[log in to unmask]>
Date:
Mon, 5 Jun 2006 12:49:53 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (45 lines)
On 6/5/06, Michelle VanAllen <[log in to unmask]> wrote:
>
> We are scanning some documents which will be OCR'd to create searchable
> PDFs. The original paper documents are mostly types correspondence and
> various forms. I have done some research which suggest that 300 dpi is
> better when using OCR. It seems to suggest that the only disadvantage is
> the larger space requirement and that may not be specific. I am
> interested to know if any one else on the list has an opinion and/or
> experience regarding the differences between 200 and 300 dpi for this
> type of application.
>

Other factors include the actual quality of the source documents you're
scanning... if they're clean, uncreased, unstapled, san serif font, black
print on white paper you may be able to get by with something less than
300dpi, but you should run some tests of a selection of images.  Take a
batch that represents the common content of the source documents and scan
them at 150, 200, 300, 400 dpi and OCR them all and then look for error
rates.

Look critically at these characters: h, b, 6, G, Q, g, q, 8, B, &, 0, O, D,
a, o, n, m , 1, I, l to see how well the OCR picks up the content correctly
(BTW, the last two are capital "eye", and lower case "ell")

Also, you may want to try viewing the resultant images on a variety of
systems the users of the repository will be using... try to zoom and pan the
images and see how they look on their screens and also how long it takes to
load the images.  The size of the scan file is of some concern, especially
if you have a massive volume of documents to scan and OCR, but the quality
of the final product is more important the the cost to store it or the time
to transmit the images.  Depending on the display resolution a monitor is
set at or the type of communications trunk you have (mainly an issue in
remote facilities), you may want to make some adjustments on how files are
stored.

Larry

-- 
Larry Medina
Danville, CA
RIM Professional since 1972

List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance

ATOM RSS1 RSS2