Creating an Index for our Documents - Imaging Millions of Paper Records

M

manshack_one

#1
Our company has millions of paper records. We're currently going through the process of beginning to image the documents and store them electronically. So far the effort is only going to encompass one or two main departments. As time goes on though each new department will be added so they can reap the benefits of an electronic archive versus paper. My issue is trying to come up with a good taxonomy or indexing scheme that can be applied across all departments. Basically I need help creating categories. Most of what I'm searching on is more of the "philosophy" of categorizing documents. For example, is it best to keep the documents separated by the departments or base the categories on a workflow type system? What happens when one department serves multiple departments? How do you handle documents that cross over the categories? If I could get some kind of guidance or pointed in the right direction I'd appreciate it. There's a ton of info on this forum but I don't seem able to find exactly what I"m looking for. Thanks in advance.
 
Elsmar Forum Sponsor

Wes Bucey

Prophet of Profit
#2
Our company has millions of paper records. We're currently going through the process of beginning to image the documents and store them electronically. So far the effort is only going to encompass one or two main departments. As time goes on though each new department will be added so they can reap the benefits of an electronic archive versus paper. My issue is trying to come up with a good taxonomy or indexing scheme that can be applied across all departments. Basically I need help creating categories. Most of what I'm searching on is more of the "philosophy" of categorizing documents. For example, is it best to keep the documents separated by the departments or base the categories on a workflow type system? What happens when one department serves multiple departments? How do you handle documents that cross over the categories? If I could get some kind of guidance or pointed in the right direction I'd appreciate it. There's a ton of info on this forum but I don't seem able to find exactly what I"m looking for. Thanks in advance.
The problem you pose is essentially the same problem folks have with paper documents - some of them fit in more than one category. When hard copies were filed in rooms or entire buildings filled with file cabinets, the only practical solution was to assign a unique file number to each document and then create multiple indexes describing all the Associated Documents (a specific term) which have some association with an individual document.

Computers automatically assign a unique identifying number to a document in a relational database. It is up to the original author or the cataloger of legacy documents to assign Associated Documents to "relate" to each document. (It ain't easy, McGee!;))

These Associated Document identifiers are essentially little pieces of code attached to the computer file for each document. You can get an idea of how this works by opening "file - properties" on any pdf file and noting the different categories under the description tab. In the section where there is a space for key words, both key words and alphanumeric file identifiers can be entered and then retrieval can be set to list ALL Associated Documents.

Quite frankly, it is a truly gargantuan task to identify and tag Associated Documents in a massive legacy pile like yours. It will require folks with a familiarity with the specific document to check files and folders for those Associated Documents.

Note that often a single document may have associations with dozens or hundreds of other documents. Take, for example, a Customer Profile. It may have Associated Documents from dozens of categories like this small list:

  1. credit report
  2. individual purchase orders
  3. payment record
  4. engineering drawings
  5. complaints
  6. on-time delivery record
  7. Inspection reports for each product it purchases
  8. shipping preferences
  9. allocation of machine time to production for this customer
  10. product recalls
  11. product revisions
  12. subcontractors and all their Associated Documents
then each item may have more Associated Documents:

  1. Inspection reports may have instrument calibration data
  2. production machine allocation may have repair, maintenance, replacement data and schedules
  3. subcontractors may have delivery schedules, pricing, payment data
  4. engineering drawings may have internal shop drawings unique to each machine
  5. Inspection reports may have operator training data and competence evaluations, chemical and physical analysis of materials
  6. shipping preferences may include contracts with specific shippers, packaging suppliers.
  7. etc. etc.
The beauty of this computerization is that once a document is scanned and receives a unique identifier, the document never needs to be duplicated, merely tagged to each of its Associated Documents (it's a two-way street - the main document gets tagged with all its Associated Documents, but in turn, each Associated Document is, itself, a main document, and thus gets tagged with all the documents to which it is associated) [i.e. an engineering drawing can be tagged to a customer, but it can also be tagged to an inspection report, also to a subcontractor, also to an allocated machine, also to a purchase order, a chemical and physical analysis of material, a production schedule, etc., etc.]

In the long run, an organization dealing with both new and legacy documents FIRST needs to decide which of the associations which may exist are pertinent enough to merit tagging on each document. the point being:
"Just because you can ultimately associate each of one million documents to each of the other 999,999 documents in some way does NOT mean that you have to. Only the relationships which are meaningful to the orderly operation of the organization need to be formally associated with each other."
So, if they weren't associated as paper hard copies, odds are they may not need to be associated as computer files, BUT it is an area to consider when scanning the document to determine if more associations might ultimately work to the benefit of the organization.
 

Ronen E

Problem Solver
Moderator
#3
Our company has millions of paper records. We're currently going through the process of beginning to image the documents and store them electronically. So far the effort is only going to encompass one or two main departments. As time goes on though each new department will be added so they can reap the benefits of an electronic archive versus paper. My issue is trying to come up with a good taxonomy or indexing scheme that can be applied across all departments. Basically I need help creating categories. Most of what I'm searching on is more of the "philosophy" of categorizing documents. For example, is it best to keep the documents separated by the departments or base the categories on a workflow type system? What happens when one department serves multiple departments? How do you handle documents that cross over the categories? If I could get some kind of guidance or pointed in the right direction I'd appreciate it. There's a ton of info on this forum but I don't seem able to find exactly what I"m looking for. Thanks in advance.
Alternative approach:

Assuming most of your documents are largely fairly-legible text (or at least contain some meaningful text that can be relied on for retrieval), use OCR to attach true text to each document image, then utilize an indexing/search engine to retrieve anything by anyone. It may not be 100% proof, but I reckon the time and work saving will be significant.

Cheers,
Ronen.
 

harry

Trusted Information Resource
#4
...........................What happens when one department serves multiple departments? How do you handle documents that cross over the categories? If I could get some kind of guidance or pointed in the right direction I'd appreciate it. There's a ton of info on this forum but I don't seem able to find exactly what I"m looking for. Thanks in advance.
You may want to look at the guidance provided by ISO 15489 - Records Management Standard
 
M

manshack_one

#5
Thanks for the responses. Right now the folders containing the documents are organized by a document checklist. Each folder is associated with a person's unique identifier. So there is some organization to it right now. My issue is that I feel that there is a lot of grouping of documents inside the folder that I want to see broken out into individual parts and labeled separately. In that sense I would like to create individual document identifiers (like the ones at the bottom of a form) so that when scanning them in all I have to do is type the folder id, then put a barcoded sheet representing that document id and let it go. All of the other levels of organizing them by department, function etc just become a map in a table. Hopefully that makes sense.

Individually the folders are already organized and could be scanned and indexed as is. However that system only works for an individual department. One of my tasks is to create a system that works for all departments. In that sense I feel that my top level categories have to become much more broad. Example would be wanting to view an employee's w-2. Top level would be "Employees", next "Employment" (as opposed to "Hiring" and "Discharge"), next would be something like "Payroll", then you get a list of documents that match that category. Choose w-2 and hit search and you get a list of all the w-2's for that individual. For me this way makes sense because the scan operator only needs 2 or 3 pieces of information to get the document in: employee id, document id and perhaps a date.

The majority of our forms and records contain too much handwritten scribble to be effectively ocr'd. I'm going to work with the individual departments to also begin creating barcodes that can be read so in the future I won't need separator sheets for the scanners. For right now we are only dealing with paper. I haven't opened the pandora's box of electronic documents (word, excel, pdf, etc) just yet.

Thanks again for the advice and insight. It's very much appreciated.
 

Ninja

Looking for Reality
Trusted Information Resource
#6
As a tool to consider...you might look at a fairly new product called "Docubin". Access privileges, OCR and thumbnails are already built in to the function. There are many software options...this one just caught my eye lately as very effective.

I am not associated with Docubin or it's provider, and in fact have not used it yet myself. Take this only as an idea among many.
 
N

Neil V.

#7
Just to be difficult, what I would want to know first is the thought process that went in to making the decision to maintain these records in electronic format. What is payoff of digitizing vs. leaving as is? What does the retention schedule look like? Does it really make sense to keep millions of records for ever and ever?..

Assuming there is good reason, like you'll go to jail if you don't.....

...However that system only works for an individual department. One of my tasks is to create a system that works for all departments....
Here's a link that may help in classification efforts:

http://www.archivists.org/saagroups/recmgmt/resources/FunctionsThesaurus2010.pdf

The thesaurus is prefaced with a description of the shift away from an organizational (departmental) classification system and towards a functional (process) classification system. This has pretty much been considered 'best practice' for past 20-30 years.

An example of this could be the function of purchasing. Perhaps the administrative department issues purchase orders for office supplies while the materials department issues purchase orders for raw materials and the HR department issues PO's for training. Instead of having each of the persons/departments who generated the record have responsibility for maintaining it in their own file structure - these records would instead be classified under the term "Debits - PO's" or something similar and filed away together.

To see what else is out there (assuming there is something other than Elsmar!?) might try looking in fields of archives, information management, and/or library science. They all have strong interests in how information gets organized, maintained, retrieved, and preserved.

Here's a couple other thesauri examples:
http://www.archives.gov/federal-register/cfr/thesaurus.html
http://www.ilo.org//thesaurus/defaulten.asp
http://www.egov.vic.gov.au/victoriaonlinethesaurus/index.htm

Thanks,
Neil
 
Last edited by a moderator:
#8
Our company has millions of paper records. We're currently going through the process of beginning to image the documents and store them electronically. So far the effort is only going to encompass one or two main departments. As time goes on though each new department will be added so they can reap the benefits of an electronic archive versus paper. My issue is trying to come up with a good taxonomy or indexing scheme that can be applied across all departments. Basically I need help creating categories. Most of what I'm searching on is more of the "philosophy" of categorizing documents. For example, is it best to keep the documents separated by the departments or base the categories on a workflow type system? What happens when one department serves multiple departments? How do you handle documents that cross over the categories? If I could get some kind of guidance or pointed in the right direction I'd appreciate it. There's a ton of info on this forum but I don't seem able to find exactly what I"m looking for. Thanks in advance.
I have done this successfully several times and I'm going to recommend an excellent book I read after I implemented my first corporate indexing system in a high volume manufacturing facility.
(Which may help if you have time for reading)

http://www.amazon.co.uk/Manufacturi...s-Information/dp/0471132691#reader_0471132691

To that, I would like to add SEO philosophies for indexing purposes.

So instead of trying to categorise everything to begin with, you can have a system that works for everyone in the business by basing the main search index on user friendly titles such as the following records for scanning:

1- Risk Analysis for Product XYZ - Record Title - Risk Analysis XYZ

2- Essential Requirements List for Product XYZ Record Title - Essential Requirements XYZ

3- Purchase order for Product XYZ - Record Title - Purchase Order XYZ

As simple as it seems, simple is best as this will optimise everyone's search results, as the titles of each record are specific and widely recognised.

Once you have stored a few hundred/thousand titles this will be the time to try to categorise them into named folders.

This can easily be achieved by now in analysing specific title quantities and grouping them accordingly.

If this is not the case, it may be an idea to file them by product, so that each product has it's own set of records.

i am against filing to departments because records can overlap and end up duplicated.

Product folders are unique to a product and nothing overlaps.

Works Orders have unique identifiers too, BUT, aren't as user friendly in searches for everyone in the company.

I hope I have given you some useful ideas.
 

Ronen E

Problem Solver
Moderator
#9
I don't understand why people hang on so tightly to folders and categories. Why can't everything be poured into one big data warehouse and mined using crawlers and a strong search engine?
 

Wes Bucey

Prophet of Profit
#10
I have done this successfully several times and I'm going to recommend an excellent book I read after I implemented my first corporate indexing system in a high volume manufacturing facility.
(Which may help if you have time for reading)

http://www.amazon.co.uk/Manufacturi...s-Information/dp/0471132691#reader_0471132691

To that, I would like to add SEO philosophies for indexing purposes.

So instead of trying to categorise everything to begin with, you can have a system that works for everyone in the business by basing the main search index on user friendly titles such as the following records for scanning:

1- Risk Analysis for Product XYZ - Record Title - Risk Analysis XYZ

2- Essential Requirements List for Product XYZ Record Title - Essential Requirements XYZ

3- Purchase order for Product XYZ - Record Title - Purchase Order XYZ

As simple as it seems, simple is best as this will optimise everyone's search results, as the titles of each record are specific and widely recognised.

Once you have stored a few hundred/thousand titles this will be the time to try to categorise them into named folders.

This can easily be achieved by now in analysing specific title quantities and grouping them accordingly.

If this is not the case, it may be an idea to file them by product, so that each product has it's own set of records.

i am against filing to departments because records can overlap and end up duplicated.

Product folders are unique to a product and nothing overlaps.

Works Orders have unique identifiers too, BUT, aren't as user friendly in searches for everyone in the company.

I hope I have given you some useful ideas.
Yes! Ever since we were freed from eight character limitation for a file name, I have advocated plain language, descriptive titles instead of serial numbers -after all, every relational database already assigns a unique identifier to each file.


I happened to recall a paragraph from promotional material for Google Docs. It seems pertinent to this thread.:

"If you are looking to convert your paper files to digital PDFs, you will need a scanner. The scanner you purchase should be able to directly work with your storage engine, without any additional steps on your part. It is much too time consuming to scan in images and then upload them manually. Because Google Docs is so popular, many name brand scanners are now supporting direct input into the Google Docs system. The scanner software will create the PDF, convert the PDF to a searchable format using OCR (optical character recognition) and then upload the file into Google Docs in one simple process, saving you enormous amounts of time and making the process more enjoyable. "
 
Thread starter Similar threads Forum Replies Date
T Errors & Omissions in Creating QMS Procedures AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 10
S Creating a Quality culture! AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 8
Y Creating Equipment qualification protocol Qualification and Validation (including 21 CFR Part 11) 2
R Accelerated Aging - Creating test samples - Implantable medical device Question Other Medical Device Related Standards 4
L Hazardous Waste - Tips for creating standardized training Miscellaneous Environmental Standards and EMS Related Discussions 2
M Informational Creating a post market surveillance (PMS) system for medical devices – Part 1 Medical Device and FDA Regulations and Standards News 7
J Sample size for creating a data base as a reference to a tested variable Other Medical Device and Orthopedic Related Topics 6
8 Creating Flow and Pull Game Lean in Manufacturing and Service Industries 6
B Main responsibility for Control Plans - creating and maintaining FMEA and Control Plans 15
shimonv Creating a new commercial product based on a modification to an existing product Other US Medical Device Regulations 4
R Creating WIs for a Heavy Civil Engineering Services company Document Control Systems, Procedures, Forms and Templates 19
M Creating a Plant Level Value Stream Map Process Maps, Process Mapping and Turtle Diagrams 1
T IMDS - Creating an MDS out of material and a made item? RoHS, REACH, ELV, IMDS and Restricted Substances 4
D How do I go about creating document logs and registers with the MS Excel Excel .xls Spreadsheet Templates and Tools 2
W ISO9001:2015 - Clause 7.5.2 - Requirements for Creating & Updating Documents ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 10
K Creating together Standard Definition for Prevention and Detection FMEA FMEA and Control Plans 1
Marc Creating an Internal Audit Program That Works for Your Organization Internal Auditing 0
M Creating change / Forcing change within a Company Lean in Manufacturing and Service Industries 7
A Creating a policy to evaluate the Third Party Security IEC 27001 - Information Security Management Systems (ISMS) 4
S Creating goals and objectives with targets and measurables for self-evaluations Management Review Meetings and related Processes 2
L Creating a xlsx Customer Complaint file to track Complaints Excel .xls Spreadsheet Templates and Tools 2
S When creating a 510k, which Guidance Document Wins? Other US Medical Device Regulations 9
P GS1 NHRN AIs - Creating Barcodes for Human Product Other ISO and International Standards and European Regulations 4
O Creating a Tool to Track & Verify Mistake Proofing Devices Document Control Systems, Procedures, Forms and Templates 5
B Any suggestion on creating the best paper plane that can hit a target perfectly? Coffee Break and Water Cooler Discussions 2
P Creating a PPAP document for my Suppliers APQP and PPAP 1
L Any recommendations on software for creating hierarchical workflows ? Misc. Quality Assurance and Business Systems Related Topics 2
AnaMariaVR2 Creating Global Giants from a Culture of Israeli Start-Ups Coffee Break and Water Cooler Discussions 0
R Creating a c-chart spreadsheet to use in my department Excel .xls Spreadsheet Templates and Tools 5
R Inspection Database Picker - Creating an Inspection Report Document Control Systems, Procedures, Forms and Templates 1
P Problem with creating VSM current map Lean in Manufacturing and Service Industries 3
G Creating an Organization Chart - AS9100 AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 5
L Creating DHF (Design History File) for Medical Device Systems Design and Development of Products and Processes 8
Q Recommendations for criteria on creating a CAPA (Corrective and Preventive Action) ISO 13485:2016 - Medical Device Quality Management Systems 8
C Creating a Multiple Table SQL (Structured Query Language) Query in Minitab Using Minitab Software 10
C Resources for creating Process Validation Procedures Qualification and Validation (including 21 CFR Part 11) 5
B Creating an Audit Mechanism (System/Plan) General Auditing Discussions 3
A Customer Profiles - Creating a Customer 'Profile' for our Top 10 Customers Document Control Systems, Procedures, Forms and Templates 2
N ISO 9001 based Audit Schedules: Creating and Maintaining - Template wanted General Auditing Discussions 5
P Creating a Traceability Database with Microsoft Access Document Control Systems, Procedures, Forms and Templates 7
V Creating a Cosmetic Inspection Specification for Powder Coated Painted Finished Parts Manufacturing and Related Processes 7
I Difficulty in creating a Quality Manual Quality Management System (QMS) Manuals 4
J Creating an Audit Schedule & Revamping QMS Internal Auditing 18
L How can you capture many positions held without creating a lengthy resume Career and Occupation Discussions 4
Miner Intro to MSA of Continuous Data - Part 10: Creating gauge families Imported Legacy Blogs 10
E Manufacturing Planning - Creating Improvements Misc. Quality Assurance and Business Systems Related Topics 7
R Creating Quality System and using it before it is complete ISO 13485:2016 - Medical Device Quality Management Systems 8
G Creating a "Records Policy" - Control of Quality Records Records and Data - Quality, Legal and Other Evidence 13
C Creating a Micro QMS - 4 geographical sites and 3 types of core activities ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 2
S Sources for Tips, Quotes with Graphics to be used for Creating Awareness? ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 1

Similar threads

Top Bottom