Capitalize the first letter of the first word in a sentence

In a thread in the main AppleScript forum, a request was made for a script to capitalize the first letter of the first word of every sentence with sed. This didn’t work as expected because the version of sed included with macOS apparently does not support the uppercasing of replacements.

There are many approaches that can be used to accomplish the goal, and I’ve included two ASObjC suggestion belows. The reported timing results are with a string that contains 33 and 1025 paragraphs.

SCRIPT ONE (25 and 380 milliseconds)

use framework "Foundation"
use scripting additions

set theString to "this is a sentence. this is a Sentence. this is a sentence.
this is a sentence. this is a Sentence. this is a sentence.
this is a sentence. this is a Sentence. this is a sentence."

set capitalizedString to getCapitalizedString(theString)

on getCapitalizedString(theString)
	set theString to current application's NSString's stringWithString:theString
	set theParagraphs to (theString's componentsSeparatedByString:linefeed)
	repeat with aParagraph in theParagraphs
		set theSentences to (aParagraph's componentsSeparatedByString:". ")
		repeat with aSentence in theSentences
			set theWords to (aSentence's componentsSeparatedByString:" ")'s mutableCopy()
			set firstWord to (theWords's objectAtIndex:0)'s capitalizedString()
			(theWords's replaceObjectAtIndex:0 withObject:firstWord)
			set contents of aSentence to (theWords's componentsJoinedByString:(" "))
		end repeat
		set contents of aParagraph to (theSentences's componentsJoinedByString:". ")
	end repeat
	return ((theParagraphs's componentsJoinedByString:linefeed) as text)
end getCapitalizedString

SCRIPT TWO (20 and 280 milliseconds)

use framework "Foundation"
use scripting additions

set theString to "This is a sentence. this is another sentence.
yet another sentence. a final sentence. The end."

set {capitalizedString, capitalizedCount, capitalizedWords} to getCapitalizedString(theString)

on getCapitalizedString(theString)
	set theString to (current application's NSMutableString's stringWithString:theString)
	set thePattern to "(\\.\\s+[:lower:]\\w*)"
	set theRegex to current application's NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
	set regexResults to theRegex's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	set theRanges to (regexResults's valueForKey:"range")
	set theCount to theRanges's |count|()
	set theWords to current application's NSMutableArray's new()
	set trimCharacters to (current application's NSCharacterSet's characterSetWithCharactersInString:(linefeed & space & "."))
	repeat with aRange in theRanges
		set theWord to (theString's substringWithRange:aRange)
		set capitalizedWord to theWord's capitalizedString()
		(theString's replaceCharactersInRange:(aRange) withString:capitalizedWord)
		(theWords's addObject:(theWord's stringByTrimmingCharactersInSet:trimCharacters))
	end repeat
	return {theString as text, theCount, theWords as list}
end getCapitalizedString

The thread mentioned above can be founder here

I’m not an English native. How do you prepare “Dr.”, “Mr.”, “Mt.”, “St.” etc. phrases?

Piyomaru. That’s a good question, and I’ll have to give that some thought. In most instances the script would work as desired. For example “Dr. smith” would become “Dr. Smith”. However, the following string would return an incorrect result:

I live on Main St. in the city of Prescott → I live on Main St. In the city of Prescott

One option is to prompt the user before capitalizing a word. Another is to include in the script an exception list–if the word before the period is in the list a replacement would not be made.

It can be done with the regex-enhancing handler I posted here this afternoon. (The regexReplace() handler assumed below.) But, as with any text-editing code, you have to have some idea beforehand of what you’re likely to meet and what you want to achieve and you may need more than one run or different editing code in order to perform different edits.

set txt to "i’m not an English native. how do you prepare “dr.”, “mr.”, “mt.”, “st.” etc. phrases? i live on Main st. in the city of Prescott."
set txt to regexReplace(txt, "(?<=^|(?<!\\s\\w{1,3})[.!?]\\s{1,10})(\\w)", "$U1")
set txt to regexReplace(txt, "\\b([dms])(?=(?:rs?|t)\\.)", "$U1")
--> "I’m not an English native. How do you prepare “Dr.”, “Mr.”, “Mt.”, “St.” etc. phrases? I live on Main St. in the city of Prescott."

Thanks Nigel. I tested your script and it worked great. I haven’t worked my way through the script yet but am I correct that it ignores characters ending with a period (e.g. St. and Dr.) based on the number of characters (3 or less).

Hi peavine.

Yes. That’s right. The first call to the handler capitalises each first “word” character after either the beginning of the text or (brace yourself!) a full stop, exclamation mark, or question mark followed by up to ten spaces, the punctuation in turn NOT following a word of three characters or fewer preceded by a space. The maximum of ten spaces just mentioned is set because the indefinitely repeating “*” and “+” operators aren’t allowed inside look-behinds. Ten is a generous margin and a definite limit.

The second call, of course, capitalises the first letter of any word beginning with a lower-case “d”, “m”, or “s” and followed by either “r”, “rs”, or “t” and a full stop. It would also capitalise abbreviations such as “srs.” or “dt.”, but the gamble here is that these are unlikely to occur or to be required to remain lower-case if they do…. :woozy_face:

Thanks Nigel. There’s a lot of stuff to consider when writing a script like this.

I rewrote my first script to prompt the user before capitalizing a string. I think limiting replacements (i.e. capitalizations) to those that follow 4 or more characters and a period might be a better choice.

use framework "Foundation"
use scripting additions

set theString to "this is sentence one. this is sentence two.
peavine lives on Main St. in Prescott.
this is sentence three. This is sentence four."

set capitalizedString to getCapitalizedString(theString)

on getCapitalizedString(theString)
	set theString to current application's NSString's stringWithString:theString
	set theParagraphs to (theString's componentsSeparatedByString:linefeed)
	set priorSentence to current application's NSString's stringWithString:"[Prior sentence not found]"
	repeat with aParagraph in theParagraphs
		set theSentences to (aParagraph's componentsSeparatedByString:". ")
		repeat with aSentence in theSentences
			set theWords to (aSentence's componentsSeparatedByString:" ")'s mutableCopy()
			set firstWord to (theWords's firstObject())
			set firstCapitalizedWord to firstWord's capitalizedString()
			if (firstWord's isEqualToString:firstCapitalizedWord) as boolean is false then
				set buttonReturned to button returned of (display alert "Do you want to capitalize the word " & quote & firstWord & quote & ". The prior and containing sentences or sentence fragments are:" message (priorSentence as text) & linefeed & (aSentence as text) buttons {"Cancel", "Skip", "Capitalize"} cancel button 1 default button 3)
				if buttonReturned is "Capitalize" then
					(theWords's replaceObjectAtIndex:0 withObject:firstCapitalizedWord)
					set contents of aSentence to (theWords's componentsJoinedByString:(" "))
				end if
			end if
			set priorSentence to (current application's NSString's stringWithString:aSentence)
		end repeat
		set contents of aParagraph to (theSentences's componentsJoinedByString:". ")
	end repeat
	return ((theParagraphs's componentsJoinedByString:linefeed) as text)
end getCapitalizedString

The following is a revision of my first script in post 1. It implements an exception list of abbreviations that do not constitute the end of a sentence. It does not need to include abbreviations where the word following the abbreviation would normally be capitalized. This approach is far from perfect–for example Dr. could be an abbreviation for doctor or drive. In the unlikely event anyone uses this script, it’s probably best to include questionable abbreviations in the exception list.

use framework "Foundation"
use scripting additions

set theString to "this is sentence one. this is sentence two.
peavine lives on Main St. in Prescott.
peavine used to live on Forest cir. in Prescott."

set capitalizedString to getCapitalizedString(theString)

on getCapitalizedString(theString)
	set exceptionList to {"st.", "cir."} -- use lowercase only and include ending period
	set exceptionList to current application's NSArray's arrayWithArray:exceptionList
	set theString to current application's NSString's stringWithString:theString
	set theParagraphs to (theString's componentsSeparatedByString:linefeed)
	set lastWord to current application's NSString's stringWithString:""
	repeat with aParagraph in theParagraphs
		set theSentences to (aParagraph's componentsSeparatedByString:". ")
		repeat with aSentence in theSentences
			set theWords to (aSentence's componentsSeparatedByString:" ")'s mutableCopy()
			set firstWord to theWords's firstObject()'s capitalizedString()
			if (exceptionList's containsObject:lastWord) is false then
				(theWords's replaceObjectAtIndex:0 withObject:firstWord)
				set contents of aSentence to (theWords's componentsJoinedByString:(" "))
			end if
			set lastWord to theWords's lastObject()'s lowercaseString()'s mutableCopy()
			if (lastWord's hasSuffix:".") is false then (lastWord's appendString:".")
		end repeat
		set contents of aParagraph to (theSentences's componentsJoinedByString:". ")
	end repeat
	return ((theParagraphs's componentsJoinedByString:linefeed) as text)
end getCapitalizedString