Need help processing lines of a large (up to 500,000 line) text file

Heya,

First post here so apologies if I ask my question wrong, or something similar.

So, I have a large text file that consists of frame information from a video file. The sample file I am currently using gives about 70,000 lines in this text file, however I would like it to be able to cope with films, and as the sample is only about 20 mins, im guessing up to about 500,000 lines from a film.

Specifically, the file consists of a line starting with the word “sequence”, followed by the hex offset of sequence header code in the video file. There is then a line beginning “gop” which I can ignore, followed by multiple (usually 11) lines beginning “picture” and for each picture there is a line following beginning with “splice”, which can also be ignored.

I need to create a hex file (get the feeling thats not really what im supposed to call it) consisting of the number of frames (pictures) up to each sequence header code followed by the offset of the sequence header code.

I am pretty new to programming in general, and so at the moment am using a combination of applescript and do shell scripts. I have code that works, but I would really like to speed up tasl, as it currently takes about an hour to run through the file.

Thanks for any assistance

D_S


to searchReplace(thisText, searchTerm, replacement)
	set AppleScript's text item delimiters to searchTerm
	set thisText to thisText's text items
	set AppleScript's text item delimiters to replacement
	set thisText to "" & thisText
	set AppleScript's text item delimiters to {""}
	return thisText
end searchReplace

on ConvertToHex(dec_num)
	do shell script "echo \"obase=16; " & dec_num & "\" | bc"
end ConvertToHex

on Make32Bit(varnum)
	
	repeat
		if length of varnum < 8 then
			set varnum to "0" & varnum as string
		else
			exit repeat
		end if
	end repeat
	
	set firstQ to characters 1 thru 2 of varnum as string
	set secondQ to characters 3 thru 4 of varnum as string
	set thirdQ to characters 5 thru 6 of varnum as string
	set fourthQ to characters 7 thru 8 of varnum as string
	set varnum to fourthQ & thirdQ & secondQ & firstQ
	
	set varnum to (searchReplace(varnum, " ", ""))
	
end Make32Bit

set cnt to 1

set framecnt to 1

repeat
	
	set reqline to cnt * 2
	
	set AppleScript's text item delimiters to {" "}
	
	try
		
		set linex to (do shell script "sed -n '" & reqline & "{p;q;}' \"/Users/sam/Temp Files/FGoffsetinfo.txt\"")
		
	on error
		
		exit repeat
		
	end try
	
	if text item 1 of linex is "sequence" then
		
		set seqoffset to text item 2 of linex
		
		set seqoffset to (Make32Bit(ConvertToHex(seqoffset)))
		
		set hexframecnt to (Make32Bit(ConvertToHex(framecnt)))
		
		do shell script "echo " & hexframecnt & " " & seqoffset & " | xxd -r -p >> /Users/sam/Desktop/test.hex"
		
		set cnt to cnt + 1
		
	else
		
		if text item 1 of linex is "picture" then
			
			set framecnt to framecnt + 1
			
			set cnt to cnt + 1
			
		else
			
			set cnt to cnt + 1
			
		end if
		
	end if
	
end repeat

display dialog "Done!" buttons {"OK"} default button 1