Image (PNG) to Text Through AppleScript

Hi,

I am trying to get the content (text) from non editable image (Tif, JPG, and PNG) via applescript.

In Monterey OS or later we will able to select the text in preview mode as manual.

I would like to automate this option. In my searching I found the below code from forum, but that code not select the text.

Can any one do the help to get the content (text) from PNG/Tif/JPG?

Thanks
Asuvath

Asuvathdhaman. It was my understanding that Optical Character Recognition (OCR) software was required to get text from a PNG or other bitmap image. If I understand correctly, you were able to accomplish this with Preview alone. Could you provide a copy of, or a link to, a PNG image that you extracted text from with Preview.

If I did misunderstand your post, I don’t believe it is possible to do what you want with AppleScript alone. A forum thread that contains an AppleScript implementation of OCR using the Tesseract utility can be found here. Unfortunately, this is not easily done and may not be worth the effort.

The Vision Framework can be co-opted for that, using AppleScript or JavaScript with Objective-C. And it’s probably possible with tesseract, too, though no longer necessary.

1 Like

Thanks chrillek. I did a Google search and found the script included below. In preliminary testing on my Ventura computer, it seems to work well.

use framework "AppKit" -- I assume this is required for NSImage
use framework "Foundation"
use framework "Vision"
use scripting additions

set theFile to POSIX path of (choose file)
set theText to getImageText(theFile)

on getImageText(imagePath)
	-- Get image content
	set theImage to current application's NSImage's alloc()'s initWithContentsOfFile:imagePath
	
	-- Set up request handler using image's raw data
	set requestHandler to current application's VNImageRequestHandler's alloc()'s initWithData:(theImage's TIFFRepresentation()) options:(current application's NSDictionary's alloc()'s init())
	
	-- Initialize text request
	set theRequest to current application's VNRecognizeTextRequest's alloc()'s init()
	
	-- Perform the request and get the results
	requestHandler's performRequests:(current application's NSArray's arrayWithObject:(theRequest)) |error|:(missing value)
	set theResults to theRequest's results()
	
	-- Obtain and return the string values of the results
	set theText to {}
	repeat with observation in theResults
		copy ((first item in (observation's topCandidates:1))'s |string|() as text) to end of theText
	end repeat
	return theText
end getImageText

The above script is from here

3 Likes

In some cases, the framework returns the strings not in the natural order (ie left to right in Latin languages). Then you’d have to check the bounding boxes and sie the strings by ascending x and descending y coordinates.

2 Likes

I reread the OP’s original post, and I’m not sure if the handler contained above will do what he wants. If he wants to get text from an image on the clipboard (which appears to be the case), the script contained below will do that. A few comments:

  • The script returns a list which is easily coerced to text after setting the desired text item delimiters.
  • An error is thrown if the clipboard does not contain an image, and error correction needs to be added for that.
  • I have essentially no expertise with the Vision framework, and there may a better way to to this. Note should also be made of the constraints noted by chrillek.
use framework "AppKit"
use framework "Foundation"
use framework "Vision"
use scripting additions

set theText to getImageText()

on getImageText()
	set thePasteboard to current application's NSPasteboard's generalPasteboard()
	set imageData to thePasteboard's dataForType:(current application's NSPasteboardTypeTIFF)
	set requestHandler to current application's VNImageRequestHandler's alloc()'s initWithData:imageData options:(current application's NSDictionary's alloc()'s init())
	set theRequest to current application's VNRecognizeTextRequest's alloc()'s init()
	requestHandler's performRequests:(current application's NSArray's arrayWithObject:(theRequest)) |error|:(missing value)
	set theResults to theRequest's results()
	set theText to {}
	repeat with observation in theResults
		copy ((first item in (observation's topCandidates:1))'s |string|() as text) to end of theText
	end repeat
	return theText
end getImageText
4 Likes

Thanks @peavine and @chrillek .

Sorry I do not know the Object C script. So I am not able to understand the code.

My actual task is to count the total number of words used on that Image as report (either excel or CSV).

I thought to store that content in any array, and I will write the data in excel or csv format from that array.

  1. I do not know how to store the data in array in Object C.
  2. How to create the CSV or Excel from Object C.
  3. We need to run the multiple PNG files in a time.

Even I am unable to display the result through “Display alert” comment in applescript, but I can able to run and found the result in Description panel.

Can you please help me to resolve my issue?

Thanks
Asuvath

Hi @Fredrik71 ,

Thank you so much for your response.

I thought this is what exact want.

Thanks a lot.

Thanks
Asuvath

Hi @Fredrik71 ,

Hope you are doing well.

After a long days, I got an enhancement of this tool.

I can able to extract the English content from this script as perfect.

Now the requirement is non-English languages like (Chinese, Korean, Japanese, Arabic etc.)

Can you please help me to get it done?

Thanks
Asuvath

Fredrik71 has answered the OP’s question, but I thought I would update my script FWIW. I tested Fredrik71’s script and my script with a simple image containing English-language text, and the scripts produce essentially identical results–both return a string and both join text fragments with a space.

As written, my script automatically detects the language found in the image, and this generally worked as expected in limited testing with English, French, Japanese, and Chinese languages. If that’s not the case, you can specify the desired languages, but you should be aware of the following, which is from the Vision documentation:

If not otherwise specified, Vision biases its results toward English. To alter its default behavior, provide an array of supported languages in the request’s recognitionLanguages property. The order in which you provide the languages dictates their relative importance. To recognize traditional and simplified Chinese, specify zh-Hant and zh-Hans as the first elements in the request’s recognitionLanguages property. English is the only other language that you can pair with Chinese.

use framework "Foundation"
use framework "Vision"
use scripting additions

set theFile to (choose file of type {"public.image"})
set theText to getText(theFile)

on getText(theFile)
	set theFile to current application's |NSURL|'s fileURLWithPath:(POSIX path of theFile)
	set requestHandler to current application's VNImageRequestHandler's alloc()'s initWithURL:theFile options:(missing value)
	set theRequest to current application's VNRecognizeTextRequest's alloc()'s init()
	theRequest's setAutomaticallyDetectsLanguage:true -- test this first
	-- theRequest's setRecognitionLanguages:{"en", "fr"} -- if the above doesn't work
	theRequest's setUsesLanguageCorrection:false -- language correction if desired but not Chinese
	requestHandler's performRequests:(current application's NSArray's arrayWithObject:(theRequest)) |error|:(missing value)
	set theResults to theRequest's results()
	set theArray to current application's NSMutableArray's new()
	repeat with aResult in theResults
		(theArray's addObject:(((aResult's topCandidates:1)'s objectAtIndex:0)'s |string|()))
	end repeat
	return (theArray's componentsJoinedByString:space) as text -- return a string
end getText

Thanks lot @Fredrik71 .

Thank you very much @peavine , I hope this will solve my request, I will check and confirm.

Thank you so much @Fredrik71 and @peavine , It is working fine for Chinese and Japanese.

I tried with Arabic for this script, but it is not supported, also I am not sure which is Arabic language code, can you please advise on this?

Thanks
Asuvath

Asuvathdhaman. The Arabic language codes are “ar” and “ara”. However, if set to these codes, my script does not perform OCR with Arabic language examples. I checked (with the supportedRecognitionLanguagesAndReturnError method) and the following languages are supported on my Sonoma computer

{“en-US”, “fr-FR”, “it-IT”, “de-DE”, “es-ES”, “pt-BR”, “zh-Hans”, “zh-Hant”, “yue-Hans”, “yue-Hant”, “ko-KR”, “ja-JP”, “ru-RU”, “uk-UA”, “th-TH”, “vi-VT”}

So, as far as I can ascertain, OCR of Arabic is not supported.

Thanks @peavine . I will noted this information.

Thanks
Asuvath

Hi @peavine and @Fredrik71 ,

Some times few image we can’t able to extract the content completely, we get it only few text from that.

Can you please advise on this?

I have attached few images here for your reference.

Thanks
Asuvath
Spanish

Asuvathdhaman. The image is degraded and I suspect that’s why only small portions of the image could be read. I’ve included a similar paragraph below, and it was read as expected with my script set to automatic mode. The new image was also read with automatic mode disabled and language-specifier mode enabled with the Spanish language code “es-ES”.

Thanks @peavine , I am clear now, I will try this suggestion.

Thanks
Asuvath

Thanks @Fredrik71 . Yes, you are right, it may be due to resolution issue.

Sorry, If I was rush anything to get information from you.

Thanks

Hi @peavine and @Fredrik71 ,

Hope you are doing good.

I tried with many languages (Japanese, Chinese, and Korean) all are working good.

I am trying with Turkish Languages, the below characters are unable to extract, other than these characters are extracted as good. Can we have any idea to get these characters too?

Ş – Latin capital letter S with cedilla
ş – Latin small letter s with cedilla
Ğ – Latin capital letter G with breve
ğ – Latin small letter g with breve
ç – Latin small letter c with cedilla
İ – Latin capital letter I with dot above

Thanks
Asuvath