i have many documents that i want to scan. every document will have about 10 different meta data tags by which i want to be able to search.
so maybe i am thinking of getting a huge scanner, scanning everything in, but then how do i label evverything? i guess i will turn them into pdf files and i will put them in a mysql DB? what is the best way to do this. i also want to make a GUI to be able to search through this database. i do not want to OCR all the documents i just want to attach like 10 keywords for every document.
please suggest to me a system or a procedure of how to do this. i want this to be searchable probably from multiple computers
what kind of programming is required?
This project has several aspects that can be addressed separately:
Scanning. Can the sheets be separated and fed through a sheet feeder? If yes, go for a document scanner like fujitsu fi-6140 or similar. Works great, up to 3000 pages a day. Still a lot of work, mind you.
If not, go for a camera setup. Look at http://diybookscanner.org/ and similar professional setups.
Expect to invest a minute per 10 to 100 pages, depending on system.
OCR. Works fine on printed text. Go for pdf with text in picture, so you dont have to proofread. Meaning you see the scanned picture in the pdf, onto which the ocr-ed text is superimposed. If this document gets printed, it is in effect a photocopy, but you can copy and paste the text from it.
Data storage and Retrieval. The solution for this depends very much on the plans you have for your data.
How many people should access it?
If alone, a file system solution might be ok.
If many, think about a digital library system like Dspace or Greenstone Digital Library.