RECMGMT-L Archives

Records Management

RECMGMT-L@LISTSERV.IGGURU.US

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Gerry McFatridge <[log in to unmask]>
Reply To:
Records Management Program <[log in to unmask]>
Date:
Wed, 21 Mar 2007 15:25:43 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (74 lines)
My real world experience is that OCR has an accuracy rate of 0-100%
<grin>

Not very helpful I know but there are so many variables that I think one
would have to do some real world testing on their own to see what
success they have with the documents they will be OCR'ing.

As far as validation of scanned files we just occasionally do a quick
visual scan (no pun intended) of the OCR output text file to see if
there are lots of mistakes or not. 

Some of the docs we scan are just printed copies of digital files. We
could grab the OCR output file and do a text compare with the original
Word doc for comparison but our experience has been that those files are
almost always clean laser printed copies and they OCR very well anyway.

Converting digital files natively our results appear to be 99.999%
accurate (based on the text compare method mentioned above). It seems
the only items that don't OCR properly are some non-alphanumeric
characters such as bullet point symbols and such. Fonts used within the
original digital file don't seem to have an impact on the OCRing
accuracy (unlike with scanned docs where the fonts can have a big impact
on the OCR accuracy).

Docs we scan from some of our older files (more than 30 years old or so)
tend to not OCR quite so well. Those files tend to be worn, faded,
carbon copies, onionskin, etc. If the scanned image is pretty clean they
will OCR pretty well but frequently many of those items just won't scan
well to begin with.


Sorry I couldn't give you more specific data but someone else may be
able to supply leads to some academic studies/research on the topic.

Gerry



-----Original Message-----
From: Records Management Program [mailto:[log in to unmask]] On
Behalf Of Alexander Fazekas-Paul
Sent: Wednesday, March 21, 2007 1:58 PM
To: [log in to unmask]
Subject: OCR accuracy statistics

Does anyone have any info or statistics on OCR (Optical Character
Recognition) accuracy. I am looking for vendor neutral
information/research on the topic. 

I understand that there may be variables based on original content,
scanned vs. native electronic files, and on what hardware and software
has been used, etc... 

We are implementing ECM at our organization, and a question has been
posed as to how is accurate OCR is, or how is OCR accuracy validated. 

Thanks in advance for any replies.

Alex Fazekas-Paul
In not so sunny, kinda rainy today San Diego.

 
 
---------------------------------
Be a PS3 game guru.
Get your game face on with the latest PS3 news and previews at Yahoo!
Games.

List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance

List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance

ATOM RSS1 RSS2