RECMGMT-L Archives

Records Management

RECMGMT-L@LISTSERV.IGGURU.US

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Larry Medina <[log in to unmask]>
Reply To:
Records Management Program <[log in to unmask]>
Date:
Wed, 21 Mar 2007 11:49:59 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (47 lines)
On 3/21/07, Alexander Fazekas-Paul <[log in to unmask]> wrote:
>
> Does anyone have any info or statistics on OCR (Optical Character
> Recognition) accuracy. I am looking for vendor neutral information/research
> on the topic.
>
> I understand that there may be variables based on original content,
> scanned vs. native electronic files, and on what hardware and software has
> been used, etc...
>
> We are implementing ECM at our organization, and a question has been posed
> as to how is accurate OCR is, or how is OCR accuracy validated.
>

Shout out to my fellow Pacific Region buddy-

Just some thoughts here... all OCR isn't alike.  It depends a great deal on
the source materials.  Factors include, age, relative condition and material
of originals, color, how much they've been handled, if they're punched or
stapled, handwritten or machine text, text size, color and font, carbons,
NCR, etc.   And I realize some of these factors may sound as though they
would impact "throughput" more than recognition, you'd be surprised how much
the color, thickness, or condition of paper impacts accuracy of OCR.

Your best bet is to take a representative sample of the source documents and
actually run them through whatever application and whatever hardware you
intend to use, and then develop some statistics on your own specific
documents.  get some of your best and worst, determine what percentage of
the whole these represent and then see if you can estimate the overall
accuracy.

You may find that some of them need to be OCRed at a higher dpi resolution
than others to get a greater level of accuracy, and like us, in some cases,
you may find that making an "intermediate copy" of the originals, and
adjusting your copier to improve the quality of the text, results in much
better accuracy on some originals.

Larry

-- 
Larry Medina
Danville, CA
RIM Professional since 1972

List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance

ATOM RSS1 RSS2