Finding an integer in a string

Hi everybody. I’m Philip, new to this forum. I’m from the netherlands so excuse me if my english isn’t correct.

I have one question. I am trying to make an application in applescript for converting subtitle files. But here comes one problem witch I can’t solve. This is an example of a sentence of a Subrip file:


1
00:01:03,300 --> 00:01:08,518
Op het toppunt van z'n macht
is 't Romeinse Rijk immens.

As you see 1 indicates the number of the subtitle (1 for the first, 2 for the 2nd, etc.). To create another subtitle I have to let the applescript replace those integers with a break. Maybe the idea is to find an integer in a string? Is it possible?

Could you show us what the file will look like after the integer has been replaced?

A couple of ideas…
If one has a block of text, it can be broken up into a list of paragraphs with

set theParaList to paragraphs of theTextBlock.

You can repeat loop through the paragraphs, testing each to see if it contains only numbers. (Repeat through the chars of the paragraph, checking the ASCII number of each.) There’s probably a better means of doing this – see if you can find an isNumber routine by searching this site.

A better way to do this sort of work is to use regular expressions. You can do a search of this site for more information on that.

Hi, thanks for your reply.
The original subtitle:

1
00:00:00,613 --> 00:00:03,744
Ik wil 'n man voor het volk zijn.

2
00:00:18,354 --> 00:00:22,528
Caesar, Caesar...
-Volk van Rome.

The output subtitle:

00:00:00.15 , 00:00:03.18 , Ik wil 'n man voor het volk zijn.
00:00:18.08 , 00:00:22.13 , Caesar, Caesar... | -Volk van Rome.

Maybe there’s a better way to work around than finding the integers. I was thinking about replacing line-breaks???

Yes, you can find line breaks. Most are ASCII character 13

set x to (offset of (ASCII character 13) in myString)

will set x to the position of the line break. Sometimes though line breaks are ASCII character 12 or 15, but typically 13.

It took me several hours of very intense mathematical concentration to figure out how 354, 528, 613, and 744 were becoming 8, 13, 15, and 18. :wink:

They’re frames per second, of course: round (24 * (354 / 1000)) → 8


"1
00:00:00,613 --> 00:00:03,744
Ik wil 'n man voor het volk zijn.

2
00:00:18,354 --> 00:00:22,528
Caesar, Caesar...
-Volk van Rome."

set str to result


"00:00:00.15 , 00:00:03.18 , Ik wil 'n man voor het volk zijn.
00:00:18.08 , 00:00:22.13 , Caesar, Caesar... | -Volk van Rome."

set test to result


set output to ""

--	It looks like two consecutive returns are used to seperate one section
--	from another:
--
set AppleScript's text item delimiters to return & return

set str to str's text items
--
-->	{ "1... ", "2... " }

repeat with i from 1 to length of str
	
	set thisCaption to item i of str
	
	set timings to paragraph 2 of thisCaption --> "00:00:00,613 --> 00:00:03,744"
	
	set AppleScript's text item delimiters to " --> "
	set {startTime, endTime} to timings's text items --> { "00:00:00,613", "00:00:03,744" }
	
	set AppleScript's text item delimiters to ","
	set {startTimeSeconds, startTimeFrames} to startTime's text items --> { "00:00:00", "613" }
	set {endTimeSeconds, endTimeFrames} to endTime's text items --> { "00:00:03", "744" }
	
	set startTimeFrames to round (24 * (startTimeFrames / 1000)) --> 15
	set endTimeFrames to round (24 * (endTimeFrames / 1000)) --> 18
	
	set startTimeFrames to ("0" & startTimeFrames)'s text -2 thru -1 --> zero-pad
	set endTimeFrames to ("0" & endTimeFrames)'s text -2 thru -1
	
	set output to output & startTimeSeconds & "." & startTimeFrames & " , "
	set output to output & endTimeSeconds & "." & endTimeFrames & " , "
	
	set textCaption to thisCaption's paragraphs 3 thru -1
	--
	--> { "Ik wil 'n man voor het volk zijn." }
	--> { "Caesar, Caesar...", "-Volk van Rome." }
	
	set AppleScript's text item delimiters to " | "
	set textCaption to textCaption as string
	--
	--> "Ik wil 'n man voor het volk zijn."
	--> "Caesar, Caesar... | -Volk van Rome."
	
	set output to output & textCaption & return
	
end repeat

set AppleScript's text item delimiters to {""} -- restore

output = test --> false... er, wait a second: extra paragraph return


set output to output's text 1 thru -2

output = test --> true

I’m convinced it must be genius what you’ve done for me and I’m very grateful for it. But I think it would be wise for me to first learn something of the basics of AppleScript. Because what you’ve posted looks like greek to me. I can decode most it, but I don’t know how to apply it in my script. I’ll use you code the time i’ve learnt the basics. Thanks anyway.

In re-reading your original post, I realized that there is some confusion here. When you provided 2 strings, I thought you wanted to know how to convert the first string into the second string, but what you said was that you wanted “to create another subtitle”.

Are you:
1. Appending a new subtitle to the end of a file/string?
2. Inserting a new subtitle into the middle of a file/string?
2a. such that all of the subsequent subtitles will need to be renumbered?
3. Are you personally converting this format:

	1
	00:00:00,613 --> 00:00:03,744
	Ik wil 'n man voor het volk zijn.

into this:

	00:00:00.15 , 00:00:03.18 , Ik wil 'n man voor het volk zijn.

or is some other application doing this for you?

--	set mySubtitleFile to choose file
--	set mySubtitles to read mySubtitleFile

--	DEBUGGING:
--
set mySubtitles to "1
00:00:00,613 --> 00:00:03,744
Ik wil 'n man voor het volk zijn.

2
00:00:18,354 --> 00:00:22,528
Caesar, Caesar...
-Volk van Rome."


--	You want to insert this subtitle between the existing
--	first and second subtitle:
--
set newSubtitle to "2
00:01:03,300 --> 00:01:08,518
Op het toppunt van z'n macht
is 't Romeinse Rijk immens."

set insertionLocation to 2 -- newSubtitle will become the second subtitle


--	The format of the subtitle file seems to be this:
--
--		integer [return]
--		start time --> end time [return]
--		subtitle [return]
--		the subtitle may have several lines [return]
--		[return]
--		integer of next subtitle [return]
--		...
--
--	If the subtitle text was always exactly 1 line, then we could
--	directly manipulate the file based on unchanging index positions,
--	but because the subtitles may be multiple lines, the only thing
--	we can rely on is the fact that there are two return characters
--	between each subtitle section.
--
--	This brings up the issue of "return" characters. I don't know if
--	your file uses carriage returns (ascii 13), linefeeds (ascii 10),
--	or the Windows crlf format (asciis 13/10). AppleScript is smart
--	enough that when we say:
--
--		set theLines to every paragraph of myString
--
--	it will get every line regardless of the line ending character
--	being used. Unfortunately, we need to get every "non-empty"
--	line, so we need to know the exact line ending character.
--
set ascii13 to ASCII character 13 -- same as AppleScript's "return"
set ascii10 to ASCII character 10

if (mySubtitles contains (ascii13 & ascii10)) then
	
	set lineEnding to ascii13 & ascii10
	
else if (mySubtitles contains ascii13) then
	
	set lineEnding to ascii13
	
else
	
	set lineEnding to ascii10
	
end if

--	We want to split mySubtitles into a list of each subtitle section.
--	The best way to do this is with the text item delimiters property
--	of AppleScript. The idea is this:
--
--		set abcString to "abc"
--		set AppleScript's text item delimiters to "b"
--		set acList to text items of abcString --> { "a", "c" }
--
--	The process can be reversed, which gives us a nice way of performing
--	search and replace:
--
--		acList -- is this: { "a", "c" }
--		set AppleScript's text item delimiters to "z"
--		set azcString to acList as string --> "azc"
--
--	Working with the "text item delimiters" property can be tricky. When
--	you are debugging a script, a script error can leave the delimiters
--	in an unwanted state. There are two approches to protecting yourself:
--
--		1. Save, error-trap, set, and restore.
--		2. Always set before use.
--
--	The first approch is best for inside of handlers:
--
--		on MyHandler()
--			set oldDelimiters to AppleScript's text item delimiters -- save
--			try
--				set AppleScript's text item delimiters to [whatever]
--				... do stuff ...
--			on error e number n
--				set AppleScript's text item delimiters to oldDelimiters -- restore
--				error e number n
--			end try
--			set AppleScript's text item delimiters to oldDelimiters -- restore
--			return [whatever]
--		end MyHandler
--
--	The second approch is easier. Just ensure that the text item delimiters
--	are set to what you need them to be before saying "get text items of...",
--	and before coercing a list to string.

--	Break mySubtitles into the seperate subtitle sections:
--
set AppleScript's text item delimiters to lineEnding & lineEnding
set mySubtitles to text items of mySubtitles
--
--	mySubtitles is now a list of strings. I'm using " `r " to show return characters:
--
-->	{	"1 `r 00:00:00,613 --> 00:00:03,744 `r Ik wil 'n man voor het volk zijn.",
--		"2 `r 00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome."
--	}

set subtitleCount to length of mySubtitles --> 2, in this case

--	We set insertionLocation to 2, indicating that the index of new subtitle
--	should be 2. This means that the current subtitle at index 2 will become
--	the third, the third will become the fourth, etc.
--
--	If insertionLocation is 1, or is 1 more than the number of the subtitles,
--	then we just need to prepend or append it. I've also handled the possiblity
--	that insertionLocation may be a negative index:
--
if (insertionLocation = 1) or (-insertionLocation = subtitleCount) then
	
	-- prepend the new subtitle:
	--
	set beginning of mySubtitles to newSubtitle
	--
	-->	{	"1 `r 00:01:03,300 --> 00:01:08,518 `r Op het toppunt van z'n macht..."
	--		"1 `r 00:00:00,613 --> 00:00:03,744 `r Ik wil 'n man voor het volk zijn.",
	--		"2 `r 00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome."
	--	}
	
else if (insertionLocation > subtitleCount) then
	
	-- append the new subtitle:
	--
	set end of mySubtitles to newSubtitle
	--
	-->		"1 `r 00:00:00,613 --> 00:00:03,744 `r Ik wil 'n man voor het volk zijn.",
	--		"2 `r 00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome."
	--		"3 `r 00:01:03,300 --> 00:01:08,518 `r Op het toppunt van z'n macht..."
	--	}
	
else -- we, however, are inserting between 1 and 2
	
	--	Get all subtitles to the left of the insertion location:
	--
	set leftSubtitles to items 1 thru (insertionLocation - 1) of mySubtitles
	--
	--> { "1 `r 00:00:00,613 --> 00:00:03,744 `r Ik wil 'n man voor het volk zijn." }
	
	--	Get all subtitles to the right, including the current "second" subtitle:
	--
	set rightSubtitles to items insertionLocation thru -1 of mySubtitles
	--
	--> { "2 `r 00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome." }
	
	--	We now just join the three parts together:
	--
	set mySubtitles to leftSubtitles & newSubtitle & rightSubtitles
	--
	--	Note: Because the leftmost value, leftSubtitles, is a list,
	--	the result of the "&" operator will be a list:
	--
	-->	{	"1 `r 00:00:00,613 --> 00:00:03,744 `r Ik wil 'n man voor het volk zijn.",
	--		"2 `r 00:01:03,300 --> 00:01:08,518 `r Op het toppunt van z'n macht... ",
	--		"2 `r 00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome."
	--	}
	
end if

set subtitleCount to subtitleCount + 1 -- we've added a subtitle

--	Now we have to handle the renumbering. All subtitles after the new subtitle
--	need to have their number incremented by 1:
--
repeat with i from (insertionLocation + 1) to subtitleCount
	
	--	Grab the subtitle out of the subtitle list:
	--
	set thisSubtitle to item i of mySubtitles
	--
	--> "2 `r 00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome."
	
	--	The subtitle number is on a line by itself:
	--
	set thisSubtitleNumber to paragraph 1 of thisSubtitle --> "2"
	
	--	Get the rest of the subtitle contents:
	--
	set thisSubtitleString to text from paragraph 2 to paragraph -1 of thisSubtitle
	--
	--> "00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome."
	
	--	The "1 + ..." will coerce thisSubtitleNumber to a number, which we
	--	then coerce back to a string:
	--
	set thisSubtitleNumber to (1 + thisSubtitleNumber) as string --> "3"
	
	--	Join them back together:
	--
	set thisSubtitle to thisSubtitleNumber & lineEnding & thisSubtitleString
	--
	--> "3 `r 00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome."
	
	--	Copy the subtitle back to the subtitle list:
	--
	set item i of mySubtitles to thisSubtitle
	
end repeat

--	The list mySubtitles should now be:
--
-->	{	"1 `r 00:00:00,613 --> 00:00:03,744 `r Ik wil 'n man voor het volk zijn.",
--		"2 `r 00:01:03,300 --> 00:01:08,518 `r Op het toppunt van z'n macht... ",
--		"3 `r 00:00:18,354 --> 00:00:22,528 `r Caesar, Caesar... `r -Volk van Rome."
--	}

--	We coerce it back to a string, with the subtitles seperated by two line endings:
--
set AppleScript's text item delimiters to lineEnding & lineEnding
set mySubtitles to mySubtitles as string
--
--	Hopefully, this is what we've got:
--
--		"1
--		00:00:00,613 --> 00:00:03,744
--		Ik wil 'n man voor het volk zijn.
--
--		2
--		00:01:03,300 --> 00:01:08,518
--		Op het toppunt van z'n macht
--		is 't Romeinse Rijk immens.
--
--		3
--		00:00:18,354 --> 00:00:22,528
--		Caesar, Caesar...
--		-Volk van Rome."


set AppleScript's text item delimiters to {""} -- restore the default value


--	Save back to file, (when you're done debugging):
--
--	set writableFile to open for access mySubtitleFile with write permission
--	set eof writableFile to 0 -- erase the entire file
--	write mySubtitles to writableFile
--	close access writableFile