Google Offers OCR for Incoming Docs
When you open up Google Docs and choose Upload, you’ll get a screen to select the file you want to upload with an option for OCR like the one you see here. I didn’t have anything to OCR handy, so I went to Google Books and grabbed Analytical Psychology by Carl Gustav Jung. I set it to upload — it’s about 9MB — and Google Docs chugged away on it for several moments. After waiting a while Google Docs told me “Unable to Convert Document.” Well, phooey. So I went back to Google Books and tried again, this time with Damon Runyon’s Rhymes of the Firing Line. That one was a lot smaller — a little under 2MB.
That one uploaded fine, but it only got the disclaimer from Google Books and the title page, because apparently there’s a limit to how much of a PDF document Google will OCR. >facepalmRotarian. (If you do a search for Google Books for the word psychedelic in magazine content available in full-text, this is the earliest result.) Google Docs processed it very quickly, but apparently didn’t like the two column format as the OCR was very poor. Here’s a sample:
Thc organization of power in competitive national units has reached iu; logical conclusion in the confron-lation of two grcat uppnscd blocs immobilized in thc grip of the cold war. Advance in thc tischnical of weaponry has given us weapons so powc rful than they cannon-we hope-be used: meanwhile na-lions are spcnding so much on amwmcnls that there is not enough lo mccl more than a fraction of other und more important psychosocial needs, Increasing emphasis on material products has lcd to wasteful ovcrcxploilatiun of Nature and a tllrcnlcncd shortage of natural rcsollrccs.
I guess what I’m getting at is that the Google Docs OCR is as it stands a bit on the fickle side; there are some size limits and apparently some layouts work better than others. But I am still excited about this. If it evolves to be a little less finicky and have fewer limits it’ll be an incredibly powerful tool for organizing PDF content.