RECMGMT-L Archives

Records Management

RECMGMT-L@LISTSERV.IGGURU.US

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
Records Management Program <[log in to unmask]>
Subject:
From:
Glenn Sanders <[log in to unmask]>
Date:
Wed, 27 Oct 2004 11:41:01 +1000
Content-Type:
text/plain; charset="us-ascii"
MIME-Version:
1.0
Reply-To:
Records Management Program <[log in to unmask]>
Parts/Attachments:
text/plain (57 lines)
Sharie

I wrote an article recently for the RMAA's Informaa Quarterly which
discussed the military / espionage origin of a lot of this sort of
software and showed that the logical extension of being able to
automatically classify documents is being able to predict the content of
documents as yet unwritten (subject to the uncertainty principle, because
of Quantum). A copy is at http://members.ozemail.com.au/~sanders/articles.html - it's the one called 'a unified theory of stuff'.

But more seriously, I see no reason why this software shouldn't be
surprisingly effective, or at least be a surprisingly effective tool if
used properly. Over ten years ago I saw commercial software retrieving
relevant documents by terms not present in the text or metadata. At home I
use an email filtering program (Mailwasher) which uses Bayesian
probability techniques to identify spam. it groups the emails into several
categories, and it's pretty accurate after a training period. Obviously
the fewer categories in your classification, the better the results.

I think the secret is in my proviso above: "if used properly". It's
similar to using OCR, or even people. No software will ever be able to get
it right 100% of the time. Therefore you have to realise that if you want
100% accuracy, you will have to check the program's output on 100% of the
documents processed (not the 2% the salesman implies by quoting 98%
accuracy). The task for you is to figure out, on a risk basis, which
documents you really need to check how far.

Automatic indexing will be a large part of all our futures - who is going
to be redundant?

Cheers (or not if you are worried by this)

Glenn

Glenn Sanders MRMA
[log in to unmask]
[log in to unmask]
Australia

These views are mine alone. They may or may not be those of any
previous or present employers or clients. I don't know. If I'd asked
and they'd agreed, I would have signed it "Bloggs and Co and
Glenn". Or whatever. But I haven't, so I didn't.


----------------------------------------------------------------------------------------
This e-mail may contain confidential or privileged information.   If you
have received it in error, please notify the sender immediately via return
e-mail and then delete the original e-mail. EnergyAustralia has collected
your business contact details for dealing with you in your business
capacity. More information about how we handle your personal information,
including your right of access is contained at http://www.energy.com.au.

----------------------------------------------------------------------------------------

List archives at http://lists.ufl.edu/archives/recmgmt-l.html
Contact [log in to unmask] for assistance

ATOM RSS1 RSS2