Range References with Text and Lists

This article was originally posted in 2006 and updated in August 2012. The differences between ‘string’ and ‘Unicode text’ before Mac OS 10.5 are now relegated to historical interest rather than the “must know” information they were at the time.

Most AppleScripters will already know basically how to use a range reference to extract a range of elements from a containing object. But many may not be aware of the full possibilities with AppleScript’s range reference form. (“Range reference form” simply means the syntax for a range reference.)

This article looks at the various ways of specifying a range in vanilla AppleScript, dealing firstly with text and then with lists. Many of the ideas may also apply to application objects, but should be tested with each application before use!

Some examples of range references:

The constituent parts are:

1. A (usually plural) keyword specifying the class of element to be returned.
2. A range phrase consisting of two boundary indicators and a keyword (usually ‘thru’) representing the intervening continuum. A reference which doesn’t have a range part with two boundary indicators isn’t a range reference.
3. A possessive link to the containing object.

The specified container and boundary objects must all exist at the time the script tries to resolve the reference.

The order in which the boundary indicators are given doesn’t make any difference. The order of the result is always the same as in the source object. As with ordinary index references, negative values can be used to index elements from the end of the container instead of from the beginning:

set myText to "This text is this way round."

words 4 thru -1 of myText
--> {"this", "way", "round"}

words -1 thru 4 of myText
--> {"this", "way", "round"}

There are more ways to specify range boundaries and I’ll mention those later. The two boundaries can of course be the same or effectually refer to the same element, in which case only that element is returned.

For those who cringe at the sight of the word ‘thru’, the alternative spelling ‘through’ is available. Most people use the former though, either because it’s less to type or ” in the case of British English speakers ” because it’s less extra to type.

The ideal is to extract what you want from your text or list in as few moves as possible. AppleScript’s range reference form is very flexible in this respect.

Range references with text.
The elements which can be extracted from vanilla text with a range reference are:

‘text’: a single excerpt. This is the only case where a range reference doesn’t return a list.

‘characters’: a list of individual characters.

‘words’: a list of individual ‘words’. A ‘word’ is, broadly speaking, a piece of text not containing any punctuation or white space. However, some punctuation (eg. non-contiguous instances of apostrophes, periods, or colons) is allowed within words and many symbols (“+”, “>”, “£”, etc.) count as words in their own right, even in direct contact with other words. So be careful what you assume when specifying ‘words’.

‘text items’: a list of segments of the range which are separated in the original by any of AppleScript’s ‘text item delimiters’ at the time the reference is used. If the range contains none of the delimiters, it’s returned intact as the sole item in the list. Delimiters are always deemed to come between ‘text items’, so if a delimiter occurs at a range boundary or immediately next to another delimiter, there’s notionally an “empty text item” on that side of it. This is manifested as an empty text (“”) in the list.

Since Mac OS 10.6, it’s been possible to have more than one delimiter in force at the same time. All the current delimiters are considered when text items are extracted from text, but only the first one is used in the reverse process of coercing a list to text.

‘paragraphs’: a list of segments of the range which are separated in the original by line feeds, carriage returns, or CRLF combinations. AppleScript does the intelligent work in deciding what’s what. Line endings at range boundaries or next to each other imply “empty paragraphs”, which, like “empty text items”, are rendered as empty texts in the list.

set myText to "We took a walk down by the river
And I fell in
Love with you"

text 7 thru 19 of myText
--> "k a walk down"

characters 7 thru 19 of myText
--> {"k", " ", "a", " ", "w", "a", "l", "k", " ", "d", "o", "w", "n"}

words 7 thru 10 of myText
--> {"the", "river", "And", "I"}

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"k", "o"} -- Two simultaneous delimiters, of which the first is "k".
set textItems to text items 1 thru 4 of myText
set textItemsAsText to textItems as text
set AppleScript's text item delimiters to astid
textItems
--> {"We t", "", "", " a wal"}
-- Notice the "empty text item" between the two instances of "o" in "took" and another between the second "o" and the "k".
textItemsAsText
--> "We tkkk a wal"
-- Only the first current delimiter ("k") is inserted between the items in a list-to-text coercion.

paragraphs 1 thru 2 of myText
--> {"We took a walk down by the river", "And I fell in"}

It’s still quite common to see even experienced scripters using something like this to derive a single extract from a text:

This is a two-stage process: first create a list containing thirteen one-character texts and then create a text containing the same characters. Since AppleScript’s first current text item delimiter is inserted between all adjacent list items in the coercion, there’s a possibility that the end result will contain a lot more than just the original characters if the delimiter’s not explicitly set to “” first.

But this:

. is a single action which delivers the required extract directly. It doesn’t waste time and memory on an intermediate list and numerous items, and offers no opportunity for the delimiters to influence the result.

In the examples so far, the boundary indicators have been the indices of the first and last of the required elements ” except in the case of ‘text’, where they’re the indices of first and last characters of the required excerpt. But it’s also possible to specify different element types as range boundaries:

set myText to "The birds were all singing a-quiver
And we shot them
A glance or two"

-- Text from the eighth character up to and including the fifth 'word'.
text 8 thru word 5 of myText
--> "ds were all singing"

-- Words from the third one to the end of the second paragraph.
words 3 thru paragraph 2 of myText
--> {"were", "all", "singing", "a", "quiver", "And", "we", "shot", "them"}
-- Notice the effect of the hyphen on the definition of 'word'.

To use a different element type as the first boundary in the range, you need either to parenthesise it or to employ an alternative syntax using the keywords ‘from’ and ‘to’.

set myText to "I later on was unfaithful
And enjoyed my little game"

-- Characters from the beginning of the third word to the seventeenth character.
characters (word 3) thru 17 of myText
-- Or:
characters from word 3 to 17 of myText
--> {"o", "n", " ", "w", "a", "s", " ", "u", "n"}

-- Words from the 16th character to the end of the second paragraph.
words from character 16 to paragraph 2 of myText
--> {"unfaithful", "And", "enjoyed", "my", "little", "game"}

The ‘from … to …’ syntax can actually be used anywhere that ‘… thru …’ can, but the compiler will change it to the ‘… thru …’ form if the first boundary’s just an index. (You could parenthesise the index to prevent this.)

Negative indices work with these “different class” boundaries too, as do AppleScript’s various ordinal forms. The ‘middle’ keyword’s best avoided though as it’s seriously bugged in this context. The keywords ‘beginning’ (or ‘front’) and ‘end’ (or ‘back’) can be used instead of the indices 1 and -1. Technically, they don’t mean the same thing, but they have the same effect.

set myText to "Till you found me out by the cemetary
And I buried my head in shame"

characters (word -5) thru (word -3) of myText
--> {"b", "u", "r", "i", "e", "d", " ", "m", "y", " ", "h", "e", "a", "d"}

text from beginning to fifth word of myText
--> "Till you found me out"

text from -5th word to -1th character of myText
--> "enjoyed my little game"

The effective range includes the whole of both boundary elements, but nothing beyond them. If an element of the type to be returned overlaps a boundary, the part of it outside the boundary is left off. If one boundary element contains the other, it constitutes the entire range!

set myText to "You forgave me and asked me to dinner
And I threw up
My arms for joy"

words from character 35 to paragraph 2 of myText
--> {"ner", "And", "I", "threw", "up"}

words from character 35 to paragraph 1 of myText
--> {"You", "forgave", "me", "and", "asked", "me", "to", "dinner"}}

Range References with Lists.

It’s not generally appreciated that items of a particular class can be returned from a list as easily as from text:

set myList to {1, true, "Hello", {a:"aardvark"}, 99, "banana", {1, 2, 3}, "world", 3.7}

myList's text -- or 'myList's strings' or 'myList's every Unicode text'.
--> {"Hello", "banana", "world"}

myList's integers
--> {1, 99}

myList's lists
--> {{1, 2, 3}}

Range references too can return either generic ‘items’ or particular classes from lists:

set myList to {1, true, "Hello", {a:"aardvark"}, 99, "banana", {1, 2, 3}, "world", false, 3.7}

items 2 thru 4 of myList
--> {true, "Hello", {a:"aardvark"}}

numbers 2 thru 3 of myList
--> {99, 3.7}

text from record 1 to number 3 of myList
--> {"banana", "world"}

The last example above returns any text objects which occur in the list between the record and the third number. The items to be returned don’t necessarily have to extend to the boundaries of the range. In fact, where their class isn’t used in the range definition, they don’t have to exist at all. If they don’t, an empty list is returned:

set myList to {1, true, "Hello", {a:"aardvark"}, 99, {1, 2, 3}, false, 3.7}
strings from record 1 to number 3 of myList
--> {}

That about wraps it up for range references. Although this article has dealt with them specifically as applied to text and lists, much of it may also be relevant when scripting application objects. It’ll depend on the application being scripted.

An aside on ‘text’.

The keyword ‘text’ has different meanings depending on the context in which it’s used.

In application scripting, it usually means the ‘text’ property (if one exists) of an open document or window in an application. It’s not used as such here.

In vanilla AppleScript, since the introduction of AppleScript 2.0 with Mac OS 10.5, it’s the class of all text held or manipulated within a script itself. (Previously there were several text classes, the two surviving ones at the time of the changeover being ‘string’ and ‘Unicode text’. ‘text’ was then a synonym for ‘string’. The new ‘text’ class is identical with the old ‘Unicode text’ and the terms ‘string’ and ‘Unicode text’ have been retained as synonyms for it. However, the ‘read’ and ‘write’ commands in the StandardAdditions still make the old distinction between ‘string’ (‘text’) and ‘Unicode text’ for the purpose of handling text data in files.)

As the element type to be returned in a range reference applied to text, ‘text’ means a single excerpt. It can also be used as a boundary element instead of ‘character’, but life’s confusing enough without that, so don’t do it. As the return class in a range reference applied to a list, it means multiple items of class ‘text’. In this case, it’s its own plural!

1 Like