Create MacOS app for OCR

Vijay_Yukthi · May 29, 2021, 8:25am

Hi all,

I’am new to swift language and Xcode, I have looking for OCR app(in MacOS BigSur) which convert the png image file to searchable pdf for multi language image files(Chinese, Japan, Korean & Arabic).

I have checked the feasibility in applescript, I can’t find any solution as well as in forums. Actually, the image files are highly secured content, so I won’t preferred open source code i.e., python, node.js etc.,

In the meantime, I have found the vision framework using swift language in Xcode. I don’t have much experience on that. I have referred the videos and blog mostly related to ios application.

But my requirement is for macos, the created app which convert the image to searchable pdf for multi-languages, and works in offline.

Could anyone please guide me how to create the text recognition app for macos using vision framework(swift language) in xcode or anyother way.

Your help is much appreciated.

Thanks
Vijay

Browser: Safari 537.36
Operating System: macOS 10.14

Mark_FX · May 29, 2021, 3:03pm

The way OCR capabilities are done these days, is with AI or Machine Learning frameworks and libraries, the popular ones are Google’s TensorFlow, or scikit-learn, better known as SKLearn, or even Keras, but there are many more, most of them require you to almost exclusively use Python.

Although I’ve used SKLearn myself for a numerical data analytical Swift coded app, I did a couple of years ago, but I had to create a Python command line tool to talk to the SKLearn library, and then access the command line tool from the Swift code, so not an ideal solution.

Apple’s CoreML framework has matured a lot from it’s original first release, which required you to load already trained models created with the Python based ML libraries, so I’ve since replaced the SKLearn library with using CoreML, although I’m using numerical regressor models created with the CreateML classes, and not text recognition models, although CoreML does have the features for text recognition, and you can at least now create your own models with CoreML, rather than having to load already trained models from elsewhere.
I’ve not used the Vision Framework, as my projects where dealing with large numerical datasets, but my understanding is that it’s popular for image recognition, and could also possibly be used for text recognition.

So you’ve landed in the wrong forum for character or word recognition software capabilities.
AppleScript cannot work with Apple’s Vision or CoreML frameworks, as they require Swift code.
I could be wrong, but I don’t even think you can use ObjectiveC either.
And AppleScript could only work with the Python based ML libraries, through a Python command line tool, via AppleScript’s “do shell script command”, so this would require you to learn both Python and AppleScript.
So your best bet would be to Google “CoreML Text Recognition Example”, as I seem to remember coming across many such examples during my own CoreML learning period.

Regards Mark

akim · May 31, 2021, 5:44am

I created a script that calls the screencapture binary, to allow the user to draw a rectangle around text embedded in an image, to capture an image as a png. It then calls the tesseract-ocr binary to recognize optical characters in the image, and pastes the textual result to the clipboard. The user can then paste the results into any visible application.
I downloaded the tesseract binary from https://github.com/tesseract-ocr/tessdoc#introduction and installed it at /usr/local/bin/tesseract on the local drive.

set TempFolderPSX to POSIX path of (path to temporary items folder)
set TimeStamp to (current date) as «class isot» as string

#	Screenshots are saved as .png files
set ScreenCaptureFilePSX to TempFolderPSX & TimeStamp & ".png"
set screencaptureSh to "/usr/sbin/screencapture" & " -i " & ScreenCaptureFilePSX
do shell script screencaptureSh

#	screencapture similar to key command shift 4.  screencapture [options] [file] ; 
#	option i = interactive. refer to https://ss64.com/osx/screencapture.html
#	option -i      Capture screen interactively, by selection or window.  The control key will cause the screen shot to go to the clipboard.  The space key will toggle between mouse selection and window selection modes.  The escape key will cancel the interactive screen shot.
#	option   -x      Do not play sounds.

do shell script "open  " & ScreenCaptureFilePSX
set TargetFolderPSX to TempFolderPSX & "Tesseract/"

#	Create directory at TargetFolderPSX with command "mkdir -p " and tag -p, so that no error will result if directory already exists and allows making parent directories if needed. -p formally indicates parent mode, but it appears that -p might better indicate permissive mode.
do shell script "mkdir -p " & TargetFolderPSX

set TargetFilePSX to TargetFolderPSX & TimeStamp

#	Tesseract github - https://github.com/tesseract-ocr/tessdoc#introduction			
#	Tesseract is an open source text recognizer (OCR) Engine

#	Ensure that tesseract binary exists
try
	tell application "System Events" to exists alias (POSIX file "/usr/local/bin/tesseract")
on error
	display dialog " Tesseract cannot be found." with title "Tesseract Engine Error" giving up after 2
	return -- abort script if Tesseract binary cannot be found 
end try

#	Tesseract OCR accepts  input from ScreenCaptureFilePSX and outputs to TargetFilePSX and appends a ".txt" extension
do shell script "/usr/local/bin/tesseract" & space & ¬
	quoted form of ScreenCaptureFilePSX & space & ¬
	quoted form of TargetFilePSX

#	set TargetFileTxt to the output from 	
set TargetFileTxt to TargetFilePSX & ".txt"
set OCRecognizedText to (read (TargetFileTxt) as string)

#	remove paragraphs and linefeeds from ocr'd text   
set ParagraphRemovedText to my ReplaceLineFeedReturnsInString:OCRecognizedText
set the clipboard to ParagraphRemovedText
my NotifyText:"Paste" Title:"OCR" Subtitle:"Paste: ⌘V"

on NotifyText:_text Title:_title Subtitle:_subtitle
	display notification ¬
		_text with title ¬
		"♦️" & _title & "♦️" subtitle ¬
		_subtitle
end NotifyText:Title:Subtitle:

on ReplaceLineFeedReturnsInString:SourceString
	set AppleScript's text item delimiters to {return & linefeed, return, linefeed, character id 8233, character id 8232}
	set SourceList to text items of SourceString
	set AppleScript's text item delimiters to {space}
	set RefinedSourceString to SourceList as text
	set AppleScript's text item delimiters to ""
	RefinedSourceString
end ReplaceLineFeedReturnsInString:

Mark_FX · May 31, 2021, 11:25am

I wasn’t familiar with the tesseract project, it looks very interesting, and certainly useful for your project requirements.
So well done on finding a way to use AppleScript for OCR capabilities.
I thought you might have to use one of the popular python ML libraries through a python command line tool, via a “do shell script” command.

Regards Mark

akim · May 31, 2021, 3:46pm

I would be interested in any ideas on methods to improve the script.

Vijay_Yukthi · June 1, 2021, 9:13am

Thanks akim,

Your code was really very useful and achieved the results with shell and applescript using tesseract engine. This was amazing.

My actual requirement is convert the scanned multi-language png files into searchable pdf. Another thing is the image contents are more secured, so the client has ignore to use tesseract and open source libraries.

The tool they want is completely offline without any library supports, And we have tried with applescript to access the user interface of Acrobat DC, but it wont work. We are not able to access the UI of Acrobat DC toolbars and its action.

Could you please share your thoughts, is it possible to achieve the text recognition in png files via apple script invoke the menus and toolbars of Acrobat DC?

Any ideas are much appreciated!

Vijay_Yukthi · June 1, 2021, 9:40am

Thanks Mark,

I’am really new to create the OCR app for MacOS using swift code as well in Xcode. And also I referred the blogs and videos related to the text recognition using swift code with ML framework all are mostly related to ios application. Could you please help me how to start create the text recognition app for Mac OS using swift…

your help is much appreciated…

Mark_FX · June 1, 2021, 3:53pm

@Vijay_Yukthi

as previously stated, my experience of Apple’s CoreML Framework was using CreateML for numeric analytical regressor models, like Linear Regression, Logistic Regression, Random Forest Regression, and even Gradiant Boosting Regression, so this type of CoreML coding has absolutely no relevance to Text Recognition.

The CoreML Framework has many different parts to it, image and text recognition would be in different areas too the numerical data modelling parts, so I could not help with the CoreML text recognition parts of the framework, because I haven’t used them myself.
But I spent about a week searching and reading online resources to help me with my project, and you will probably have to do the same, these machine learning projects are a steep learning curve, and there really isn’t any short cuts to the learning.
And as also previously stated, I haven’t used the Vision Framework, because I had no need for it in my own project.

If you don’t know the Swift language, then the starting point for you would be learning the basics of coding in Swift, because the CoreML and Vision Frameworks require you to use Swift.
And you will find endless examples of Swift coding tutorials online.

If a lot of the online examples of text recognition are written for iOS, then you will have to convert the code over to MacOS framework types, as the majority of the iOS framework types have MacOS framework equivalents.
For example iOS framework types have the UI prefix, and the MacOS framework types have a NS prefix, like UIView to NSView, but the code for both are very similar, and Xcode will help with your Swift code syntax and conversions.

As I stated before, this is the wrong forum for Xcode and Swift and CoreML, as these forums are exclusively for AppleScript and AppleScriptObjC coders, so when you get going with your project, you will have to find a good Swift developer forum to help with problems and issues…
But I know the resources your looking for are out there on the interweb, as I came across many examples for text recognition with CoreML, when I was researching for my numerical data modelling and prediction project, but I paid them little attention, as they were not a part of the framework I was needing to learn.

In short, you have to roll your sleeves up, and get stuck into all of the online tutorials and examples, and I can tell you one thing for sure, it will take you a good deal of time and effort, but there really isn’t any short cuts, and with machine learning on Apple’s platforms being a fairly new thing, there won’t be loads of tutorials, but there will be enough to get you going, and from that point you have to learn things by trial and error to fill in the gaps, as I did myself.

Good Luck with it.

Regards Mark