Image (PNG) to Text Through AppleScript

The languages supported on your computer can be determined by running the script included below. The Turkish language (codes tr and tur) is not a supported language on my computer. A few comments:

  • You may want to toggle the setUsesLanguageCorrection property to see if this makes a difference.
  • The VNRecognizeTextRequest class does have a custom words property, but I don’t think that’s what you want.
  • I did a quick Google search and couldn’t find any way to add languages for use when performing OCR with VNRecognizeTextRequest. There are commercial apps that will OCR Turkish, though.
use framework "Foundation"
use framework "Vision"

set theRequest to current application's VNRecognizeTextRequest's alloc()'s init()
set supportedLanguageCodes to (theRequest's supportedRecognitionLanguagesAndReturnError:(missing value)) as list
--> {"en-US", "fr-FR", "it-IT", "de-DE", "es-ES", "pt-BR", "zh-Hans", "zh-Hant", "yue-Hans", "yue-Hant", "ko-KR", "ja-JP", "ru-RU", "uk-UA", "th-TH", "vi-VT"}

Thanks @peavine .

use framework "Foundation"
use framework "Vision"
use scripting additions

set theFile to (choose file of type {"public.image"})
set theText to getText(theFile)

on getText(theFile)
	set theFile to current application's |NSURL|'s fileURLWithPath:(POSIX path of theFile)
	set requestHandler to current application's VNImageRequestHandler's alloc()'s initWithURL:theFile options:(missing value)
	set theRequest to current application's VNRecognizeTextRequest's alloc()'s init()
	--theRequest's setAutomaticallyDetectsLanguage:true -- test this first
	theRequest's setRecognitionLanguages:{"tr", "tur"} -- if the above doesn't work
	theRequest's setUsesLanguageCorrection:true -- language correction if desired but not Chinese
	requestHandler's performRequests:(current application's NSArray's arrayWithObject:(theRequest)) |error|:(missing value)
	set theResults to theRequest's results()
	set theArray to current application's NSMutableArray's new()
	repeat with aResult in theResults
		(theArray's addObject:(((aResult's topCandidates:1)'s objectAtIndex:0)'s |string|()))
	end repeat
	return (theArray's componentsJoinedByString:space) as text -- return a string
end getText

The above code (I have changed as your suggestion) extract the content, but still we are missing the special character as I mentioned.

I hope I have applied your suggestion as correct, please correct me if anything wrong.

Thanks

Asuvathdhaman. Turkish is not a supported language and, as far as I know, setting the recognized languages to tr and tur does nothing. I got a sample of the Turkish language and the settings shown below returned the best results for Turkish language text, but the results are far from perfect.

use framework "Foundation"
use framework "Vision"
use scripting additions

set theFile to (choose file of type {"public.image"})
set theText to getText(theFile)

on getText(theFile)
	set theFile to current application's |NSURL|'s fileURLWithPath:(POSIX path of theFile)
	set requestHandler to current application's VNImageRequestHandler's alloc()'s initWithURL:theFile options:(missing value)
	set theRequest to current application's VNRecognizeTextRequest's alloc()'s init()
	theRequest's setAutomaticallyDetectsLanguage:true
	theRequest's setUsesLanguageCorrection:false
	requestHandler's performRequests:(current application's NSArray's arrayWithObject:(theRequest)) |error|:(missing value)
	set theResults to theRequest's results()
	set theArray to current application's NSMutableArray's new()
	repeat with aResult in theResults
		(theArray's addObject:(((aResult's topCandidates:1)'s objectAtIndex:0)'s |string|()))
	end repeat
	return (theArray's componentsJoinedByString:space) as text -- return a string
end getText
1 Like

FWIW, I’ve included a screenshot below of a Turkish language image example (top) and the results returned by the above script (below). The characters returned are generally OK, but the special characters (especially the cedilla) are often missing. I don’t know if this is a limitation of the OCR code or because the Turkish language is not supported.

1 Like

Thanks @peavine .

Yes, I am also got the same, I have the problem with below characters as mentioned earlier.

Ş – Latin capital letter S with cedilla
ş – Latin small letter s with cedilla
Ğ – Latin capital letter G with breve
ğ – Latin small letter g with breve
ç – Latin small letter c with cedilla
İ – Latin capital letter I with dot above

Thanks
Asuvath

1 Like