search and replace ignores some search items

hi there!
my script should search a text document for “==” and replace it with “”.
the document is 4,6 mb big.
the script allways ignores the last 7 “==” of 5212.

Here´s the script:

is there a limit of file size?
best,
andreas

here the script again:


set SearchText to {"=="}
set replaceText to {""}

tell application "Finder"
    set thePathToCollectionFile to Traktor3Folder & "collection.nml" as text
    set theCollectionFileRefNum to open for access file thePathToCollectionFile with write permission
    set theText to read theCollectionFileRefNum
    
    set theText to my searchReplaceCollectionText(SearchText, replaceText, theText)
    
    write theText to theCollectionFileRefNum starting at 0
    
    close access theCollectionFileRefNum
end tell

on searchReplaceCollectionText(SearchText, replaceText, theText)
    set theText to theText as text
    set oldTID to AppleScript's text item delimiters
    
    repeat with i from 1 to count SearchText
        set AppleScript's text item delimiters to SearchText's item i
        set theText to theText's text items
        set AppleScript's text item delimiters to replaceText's item i
        set theText to theText as text
    end repeat
    
    set AppleScript's text item delimiters to oldTID
    return theText
end searchReplaceCollectionText

You can discover how large the file is with the command “get eof”. You can also read from 0 to something less than the end of the file, and then from the previous to to the end of the file. Try that.

trying to find the root of the issue, i´m confronted with new problems. i had a try with a 25,6 mb text file and i get:

is there a way t deal with bigger files?

any help apreciated,
andreas

Hi,

as Adam mentioned, with open for access you can read and write files sequential.
That means for example, read 100 lines, process them, and write them back into a new file,
then read the next 100 lines and so on.
But the way to spilt the contents of the file at the right place depends on its structure.

thanx for the hint.
i can´t really make it. can point me to a link with an example script?
i tried the following:


set derStartTeil to 0
	set derEndTeil to 20000
	set fertig to 0
	repeat until fertig is equal to 1
		try
			set theText to read theCollectionFileRefNum from derStartTeil to derEndTeil
			
			
			if derEndTeil is equal to end of theCollectionFileRefNum then
				set fertig to 1
			end if
	
		on error
			set fertig to 1
			--set theText to read theCollectionFileRefNum from derStartTeil until end
			
		end try
		
		set theText to my searchReplaceCollectionText(SearchText, replaceText, theText)
		
		write theText to theCollectionFileRefNum starting at derStartTeil
		
		set derStartTeil to derEndTeil
		set derEndTeil to derEndTeil + 200000

thanx,
andreas

first of all, you have 20.000 at the beginning but you’re adding 200.000 at the end of the script, this cannot work.
I would gather the size of the file, divide it to 20.000 (div and mod), then repeat “div” result times,
if there is any rest (mod result), process it after the repeat loop.
You have also to consider, if the last character of the segment is one character, which should be replaced.
The search and replace handler won’t work in this case.
And after each iteration it continues with x + 1, e.g. 1 - 20000, 20001 - 40000, 40001 - 60000 etc.

puhh…
i will think about it and let you know, if i could make it. is it necessary ti write in a new file?
thank you for all this information.
andreas

In your case no, because the length of both strings is equal

trying to find out the size of the file:


set groesseDerListe to get eof theCollectionFileRefNum
	log groesseDerListe

the result is:
5.01979E+6

what does this mean?

5.01979E+6 = 5.01979 * 10 ^ 6 = 5019790 (Bytes)

thanx to your help adam and steffan, it´s working now (underc certain circumstances :/).

it runs fine, if the file i want to convert is not on the system drive. if it is on the system drive, i get a “file is already open error” for:


	set theNewCollectionFileRefNum to open for access Traktor34CollectionFile with write permission

of


tell application "Finder"
	
	--backup ordner erstellen und original rein copieren
	if exists folder "NI Traktor 3.4.x Backup" of desktop then
		delete folder "NI Traktor 3.4.x Backup" of desktop
	end if
	set backupOrdner to make new folder at desktop with properties {name:"NI Traktor 3.4.x Backup"}
	set pathTocbackupOrdner to backupOrdner as text
	move Traktor34CollectionFile to backupOrdner
	
	set thePathToCollectionFile to pathTocbackupOrdner & "collection.nml" as text
	
	set theCollectionFileRefNum to open for access file thePathToCollectionFile with write permission
	set theNewCollectionFileRefNum to open for access Traktor34CollectionFile with write permission
	--....
end tell

is there somthing special to take care of on the system drive?

did i ask somthing stupid?

No, but it’s impossible to find the cause having only those few lines.
It’s most likely not the reason, but it’s always recommended not to put
Standard Addition commands like read and write into a application tell block

Off-hand I don’t see anything wrong with the script, and 4.6MB should be well within any limits, but I might take a different approach, anyway. Processing in chunks is OK as long as you don’t chunk over the search string (e.g. the first chararacter ends up as the last character in the one block and the second character is the first character in the next block).

Instead, consider:

set thePathToCollectionFile to Traktor3Folder & "collection.nml" as text
do shell script "sed -i -e s/==//g " & quoted form of POSIX path of thePathToCollectionFile

(of course, you could expand this into a loop replacing multiple sets of characters - the point is to use sed to do the global search and replace).

Going back to the original script, one obvious issue that hasn’t been remarked upon is that the file’s not being emptied before the ‘write’ line. Replacing 5212 instances of “==” with “” results in a text that’s 10424 characters shorter. If this is written back to the original file without emptying the file first, the last 10424 characters of the original text will still be in the file, sticking out beyond the end of the new, shorter text. If the last seven instances of “==” are in this group, it will look as though the script has ignored them, when in fact it hasn’t.

set theText to read theCollectionFileRefNum

set theText to my searchReplaceCollectionText(SearchText, replaceText, theText)

-- Empty the file (ie. set its length to 0) before writing back to it.
set eof theCollectionFileRefNum to 0
write theText to theCollectionFileRefNum starting at 0

As from Leopard, all text handled internally by AppleScript is UTF16 Unicode text. Without an ‘as’ parameter, the ‘read’ command still assumes that the data in the file represent the old ‘string’ text, but returns Unicode text to the script. Without its own ‘as’ parameter, the ‘write’ command still writes data to file in the form presented, which is Unicode text in Leopard ” which may not be what’s wanted. To ensure the end result’s the same on any system, it’s now necessary to be specific about how text is read from and written to a file:

-- If the file contains the old-style 'string' text:
set theText to read theCollectionFileRefNum as string

set theText to my searchReplaceCollectionText(SearchText, replaceText, theText)

set eof theCollectionFileRefNum to 0
write theText as string to theCollectionFileRefNum starting at 0


-- Or, if the file contains Unicode text:
set theText to read theCollectionFileRefNum as Unicode text

set theText to my searchReplaceCollectionText(SearchText, replaceText, theText)

set eof theCollectionFileRefNum to 0
write theText as Unicode text to theCollectionFileRefNum starting at 0

Stefan’s advice not to use the File Read/Write commands in application ‘tell’ blocks is good. Access references are specific to the application that opens them, so if you open an access in a Finder ‘tell’ block, you can only use that access within a Finder ‘tell’ block. It’s generally more convenient to have accesses belonging to the script itself (or, more precisely, to the application running it), which is achieved by not having them in ‘tell’ blocks.

This usually means there was an error on a previous run while the file was open with write permission, and the script stopped before reaching the ‘close access’ line. A file can only have one write-permission access open at a time, so the abandoned one has to be closed before another can be opened. You can either write another script to tell the Finder to close the access (because the Finder opened it in your script), or you can use some method to quit the Finder, which will release all its currently open accesses.

You should put everything that happens while a file’s open for access into a ‘try’ block so that, in the event of an error, the script keeps going long enough to close the access.

set theCollectionFileRefNum to (open for access file thePathToCollectionFile with write permission)
try
	set theText to (read theCollectionFileRefNum as string)
	
	set theText to my searchReplaceCollectionText(SearchText, replaceText, theText)
	
	set eof theCollectionFileRefNum to 0
	write theText to theCollectionFileRefNum starting at 0
on error errMsg number errNum
	-- In the event of an error, close the access to the file before allowing the error to stop the script.
	close access theCollectionFileRefNum
	error errMsg number errNum
end try
close access theCollectionFileRefNum

-- Rest of script.

I must say that Camelot’s approach seems a lot less trouble ” though of course you have to learn how to use ‘sed’ first. :wink:

how i understood, the error occurs only if the file is not on the system drive. therefor i guess you should “duplicate” your fle instead of “move”. try:


 duplicate Traktor34CollectionFile to backupOrdner

instead of:


 move Traktor34CollectionFile to backupOrdner

brother in names!

that was the key. sorry for the late reply.