How do you manage your project? Data Conversion and Digitization

Boboy

Starting to get Involved
#1
We’ll be working a project in a government learning institution. I was hired to manage the project. There are volumes of many data to be converted, digitized into texts. So it wil be scanned, OCR is optic character recognition, into texts. These texts then must of course be turned into useful information, sorted maybe become a part of a database etc. What I heard is that it's about physical research materials (written like "masters or PhD") who wil be OCRed into text. Does anyone have experience with this? Can you share journey or give some tips for us newcomers to the field?
 
Elsmar Forum Sponsor
Q

QAMTY

#2
Hi,look for a company that have scanners,a good ocr software,they usually have a special software for this task.
I hired a company which scanned about, 100000 documents,including books,manuals , letters,brochures, etc.
They were ocred,converted to Pdf format.
Two important issues, define the appropriate resolution, because you may have a very crisp document scanned at 3000 dpi, but its size is maybe 15 mb,while scanned at 200 dpi,maybe is 300 kb,and most of the times 300 dpi,is ok.
Consider that heavy pdfs, will take longer to be read.
Other point,in order to organize the data for an easy sear ching,it is important to define a good structure.
For example it could be,manuals,type,region,speciality ,other, pictures,type,region,zone,etc.
Normally they sell a special software for the management of the data,it is better compared to the option of reading documents by using only the windows explorer and a pdf reader.
Other issue,what are you going to do to ocrscan additional documents,once they left the company, remember that they should be scanned in a similar way to previous, documents,additionally there are out there document management software, with it, you can manage the pdfs as well.

Regards
 

Boboy

Starting to get Involved
#3
Thank you, QAMTY, for your reply. Very useful. I am on the right track of being pro-active. I will prepare some questions and request that they be answered because I am willing to do my best for the company. On my part, there are no wasted research. Who knows, I might stumble on something too. I’ll give more info once my questions were answered. Btw, I would appreciate if anyone could suggest “right” questions to ask, something like a gap analysis, but fit for the project - migration from paper based to paper free. Again, thanks a lot.
 

howste

Thaumaturge
Super Moderator
#4
I don't claim to be an expert but I've done a fair amount of scanning and OCR over the years. If you're converting to text then you need to scan at the best resolution for OCR. Searches in a database could be useless if there are key words that aren't recognized correctly. If you're also planning to save the images to PDF and space is a factor, high resolution images can always be reduced after the OCR for better storage.
 
Q

QAMTY

#5
For the searching most of the software, use metadata ,it is a key data on every document, which help the sw for the searching, it doesnt use the ocred text.
Regards
 

Boboy

Starting to get Involved
#6
I don't claim to be an expert but I've done a fair amount of scanning and OCR over the years. If you're converting to text then you need to scan at the best resolution for OCR. Searches in a database could be useless if there are key words that aren't recognized correctly. If you're also planning to save the images to PDF and space is a factor, high resolution images can always be reduced after the OCR for better storage.

Hi Howste. Thanks for your experiential in-put.

While formulating some questions, I noticed some confusing terminologies, and should be defined, because books, physical research materials (thesis and dissertation) can easily be interpreted as “Document”. I might call them product document. And i might call the other type of document as management document. Please suggest better or more appropriate terminologies.

My examples....

Management Documents: written procedures, manuals, blank forms, current memos, notices, etc

Records: filed out forms, archived procedures, memos, notices

Product Document: book, physical research (masteral and Phd)

Appreciate your help.
 

howste

Thaumaturge
Super Moderator
#7
My examples....

Management Documents: written procedures, manuals, blank forms, current memos, notices, etc

Records: filed out forms, archived procedures, memos, notices

Product Document: book, physical research (masteral and Phd)

Appreciate your help.
I don't see a problem with the terms you've used, as long as they work for the intended users of the system you're setting up. Alternatives to your Management Documents category that I've heard are "Policies and Procedures," "Instruction Documents" or "Command Media." Product Documents could be "Publications," "Scientific & Technical Information," or "Research Results." Overall I don't think the terms you use are important as long as they adequately describe what they are so people can find what they need.
 
Q

QAMTY

#8
Have into consideration that some documents nay be hand written, so that the Ocred output is not completely understood.

In fact, take into consideration that you may have problems in the conversion regarding the reliability of the text recognition.


Before taking a decision, have a comparative table of suppliers, take some "difficult documents" hand-written, pictures, high density text pages, etc., ask them to do some testings, see the results and you may note important differences among them.

Important:
-How quickly you open a heavy document
-Ease of navigation
-Organization structure
-Features of SW to be used, for the scanning and also for the browsing.
-Cost of SW, upgrades,etc.
-SW manufacturer
-SW Support
-Sw latest technologies


Regards
 

Boboy

Starting to get Involved
#9
Have into consideration that some documents nay be hand written, so that the Ocred output is not completely understood.

In fact, take into consideration that you may have problems in the conversion regarding the reliability of the text recognition.


Before taking a decision, have a comparative table of suppliers, take some "difficult documents" hand-written, pictures, high density text pages, etc., ask them to do some testings, see the results and you may note important differences among them.

Important:
-How quickly you open a heavy document
-Ease of navigation
-Organization structure
-Features of SW to be used, for the scanning and also for the browsing.
-Cost of SW, upgrades,etc.
-SW manufacturer
-SW Support
-Sw latest technologies


Regards
Please excuse my ignorance, but what is SW?

QAMTY, I want you to know how much I value your support.
 
Thread starter Similar threads Forum Replies Date
Marco Bernardi Finding a flat or not modeling distribution: how to manage it? Capability, Accuracy and Stability - Processes, Machines, etc. 24
A How to manage the QMS system and SOP during the transition from MDD to MDR EU Medical Device Regulations 4
silentmonkey Seeking efficient method to manage install base data Manufacturing and Related Processes 0
J Microsoft Teams to manage Internal Audit System? Internal Auditing 3
L How to manage a Metrology Department or Calibration Laboratory General Measurement Device and Calibration Topics 3
N Who shall manage PPAP process? APQP and PPAP 4
Q Is ISO 9001 fully enough to manage a business? ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 19
D Any Elsmar members using Traqpath to manage CAPA? Nonconformance and Corrective Action 2
S Plant within a Plant - Does anyone currently manage their facility this way? AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 2
J IMDS: How to manage same Raw Material from Two Suppliers? RoHS, REACH, ELV, IMDS and Restricted Substances 2
Q Efficient way to manage different country Medical Device Regulations Other US Medical Device Regulations 10
Q How to manage with customer % failed product ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 4
L How do you manage your Inspection Drawings? IATF 16949 - Automotive Quality Systems Standard 9
Q Conflict Management - Systemic way to manage conflicts within Teams Book, Video, Blog and Web Site Reviews and Recommendations 6
S How to track and manage all uncertainty budgets - Database? Measurement Uncertainty (MU) 3
J Anyone here use Paradigm Sofware to manage compliance with ISO standards Quality Assurance and Compliance Software Tools and Solutions 3
J How can I manage Occurence Rates in FMEA efficiently when I have many processes? FMEA and Control Plans 9
T Best place in QMS to manage Quality Objectives ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 11
L How to manage prolongation of Calibration Period on Equipment General Measurement Device and Calibration Topics 9
drgnrider Definition of "Manage" as used in ISO 14001 related Documents ISO 14001:2015 Specific Discussions 3
G How to manage/control critical items and key characteristics? AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 1
B The best way to manage changes to the documentation of PCB files & BOM Inspection, Prints (Drawings), Testing, Sampling and Related Topics 8
M Looking for software to manage ISO 9001:2008 system Quality Assurance and Compliance Software Tools and Solutions 4
D Ticketing System to Manage Five Service Lines for proposed QMS ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 5
K Records Control - How do you Manage tons of Paper Records? Records and Data - Quality, Legal and Other Evidence 10
F How to manage Intermediary Sales Agents ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 2
Q What are the Product Realisation processes that manage Finances? ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 3
R Suggestions for a practical way to manage Contract Review Contract Review Process 5
G How to manage a problem Overseas Tooling Supplier Supplier Quality Assurance and other Supplier Issues 7
T How to manage Suppliers in accordance with ISO 13485 Requirements Supplier Quality Assurance and other Supplier Issues 5
T Action plan: How to manage actions, preventive actions and opportunities of improv? ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 9
M How to Manage Employees (Operators) - Training for specific Jobs Manufacturing and Related Processes 4
M How do you manage Customer Specific Requirements (CSRs)? IATF 16949 - Automotive Quality Systems Standard 6
V How do you manage the Review Comments post to Approval of a Document Document Control Systems, Procedures, Forms and Templates 3
C I asked myself a question 'What does a Management System Manage?' ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 22
G How do OEM's manage Special Customer Requirements from its End Customers? Customer and Company Specific Requirements 1
Sidney Vianna OASIS Modification To Manage Auditor Type (AEA/AA) For Each Authentication AS9100, IAQG, NADCAP and Aerospace related Standards and Requirements 1
W How to manage drugs during a manufacturing process? ISO 13485:2016 - Medical Device Quality Management Systems 5
E Software that will manage my APQP documents Quality Assurance and Compliance Software Tools and Solutions 9
S How to better manage files on a personal computer? After Work and Weekend Discussion Topics 8
P How to manage loose Pin Gages without Identification on the machining floor Calibration Frequency (Interval) 19
Marc What software (if any) do you use to manage your quality system? Quality Assurance and Compliance Software Tools and Solutions 57
Chennaiite How to manage Variation among Auditors? General Auditing Discussions 8
A Competency Records Management - How do you manage your training/competency records? Training - Internal, External, Online and Distance Learning 1
K How to Manage / Drive your Internal Supplier Supplier Quality Assurance and other Supplier Issues 7
J Software or template to manage first piece, in process and final inspection reports Software Quality Assurance 2
S How to manage (control) a copy of ISO 9001:2008 Document Control Systems, Procedures, Forms and Templates 20
E Quality Software - Who uses software to manage your Quality System? ISO 9000, ISO 9001, and ISO 9004 Quality Management Systems Standards 2
M Experience with IPC 1752 compliance? Manage and report ROHS information RoHS, REACH, ELV, IMDS and Restricted Substances 4
D Procedure to Manage, Repair and Modify Tooling needed Document Control Systems, Procedures, Forms and Templates 2

Similar threads

Top Bottom