I have a simple script to read the contents of a txt file and set it as a variable
tell application "Finder" to set theFile to item 1 of (get selection)
set documentContents to (read (theFile as alias))
What I’d like to do if possible is to set the first line of the text file to one variable, then set the second line (or just rest of that text in the document) to another variable.
Is this possible? Any help would be greatly appreciated:-)
When you open for access a file you can read parts of a file, the open for access command will keep an pointer to read, so when invoking the read command again it will read from the last position it read from before.
set theFile to "/etc/hosts"
try
set fd to open for access theFile
set var1 to read fd until linefeed
set var2 to read fd until linefeed
close access fd
on error
close access theFile
end try
return {var1, var2}
tell application "Finder" to set theFile to item 1 of (get selection)
set documentContents to (read (theFile as alias))
set x to count of paragraphs of documentContents
set var1 to paragraph 1 of documentContents
set var2 to paragraphs 2 thru x of documentContents
set var2 to my joinAList(var2, return)
on joinAList(theList, delim)
set newString to ""
set oldDelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to delim
set newString to theList as string
set AppleScript's text item delimiters to oldDelims
return newString
end joinAList
DJ, is there a way of amending your version so var2 goes to the end of the doc instead of the next linefeed?
‘until linefeed’ includes the linefeed (if there is one) in the result, so this has to be edited from the end of var1 (and possibly var2) if you don’t want it. You could use ‘before linefeed’, but then you’d either have to edit the linefeed from the beginning of var2 instead or insert a line to read the file for 1 byte before reading into var2. It may be quicker and simpler just to read in the entire text and set var 1 and var2 to ‘paragraph 1’ and ‘paragraph 2’ of it respectively.
The values of the ‘read’ command’s ‘until’, ‘before’, and ‘using delimiter’ parameters are taken as single bytes, so unless the text in the file consists entirely of single-byte characters, these parameters may not work as expected:
A linefeed is a single byte in both ASCII and UTF-8 text, and is the second of two bytes in text saved as Unicode text by the ‘write’ command, so you may be able to get away with using ‘until’ here. But you should be aware of what these parameters actually do.
Sure, remove the until parameter and the read command continues from the last position of the previous read till the end of the file:
set theFile to "/etc/hosts"
try
set fd to open for access theFile
set header to read fd until linefeed
set content to read fd
close access fd
on error
close access theFile
end try
return {header, content}
set documentContents to (read (choose file of type "txt"))
set variable1 to paragraph 1 of documentContents
set paragraph2Offset to (length of variable1) + 1
if id of (character (paragraph2Offset) of documentContents) < 20 then -- check for CRLF
set paragraph2Offset to paragraph2Offset + 1
end if
set variable2 to text paragraph2Offset thru -1 of documentContents
This is interesting (kudos to Nigel for pointing out the problems), since it can reduce the need for reading in a big file, and at the same time not miss where that paragraph is, for the case that the “paragraph” is a rather long one.
I don’t know of any way I can convert the clipboard to utf-8, since the stuff is split into characters, there is no easy treat by using iconv either, so I figured, that once I had the paragraph, I’d write it to a file, and then read it back as utf-8.
I reuse the file, and the filehandler here, as this is just an experiment.
It’s a swings-and-roundabouts situation. It may only have to read to the end of the paragraph, but each byte has to be checked as it comes off the disk to find out when the read should stop, so the read process itself is slower. From an efficiency point-of-view, it probably works best with a short paragraph in a very long file. Otherwise, as I said earlier, it’s faster and simpler to just read the whole file without the ‘until’ filter and let AppleScript sort out the paragraph(s) in memory:
tell (read (choose file)) to set {var1, var2} to {paragraph 1, text from paragraph 2 to -1}
I didn’t take into account the much slower way of reading the file, when testing for a value. But I saw the paragraphs version of it all.
For flexible solutions, as to which pararaphs should be read in, when the target is different from the first one, then a combination of head and tail in a do shell script might do the trick. (To preserve the encoding for those of us that uses characters outside of the ascii charset).
To get the paragraphs 7 thru 11, one might do something like
Maybe sed is better than reading in the whole file if the sentinel character is something different than linefeed, but then again, for all I know the whole file is read into a buffer by the OS behind the scenes anyway.
That would hardly be noticeable because it’s written underneath In C (read: it’s a scripting addition). When reading a file you read character for character and every character is checked for it’s value and pushed into a buffer before it continues to read the next character. Higher programming languages will hide this from the developer and will read entire lines or even entire files directly into the buffer, but underneath the protocol has to be followed just like C. Reading until a character is therefore just as long as reading to a position and makes no difference in performance whatsoever in C. It’s one of the things where C really differs from AppleScript.
For all I know, the Standard Addtion, uses buffered I/O in regular cases, and unbuffered I/O only when it tries to read until something. Maybe you know better. By the way, the until preposition was a great spot!
You’re spot on! The read command uses Carbon’s FSReadFork(), which will eventually use the pread() system call. Which is indeed buffered. At least that’s the results I get when running AppleScript 2.3.2 (Mavericks) against Xcode’s debugger. But it doesn’t change the fact that until doesn’t affect any disk IO performance, and because the returned AppleEvent is smaller it will be faster as well.
It is layer upon layer here. I am not sure if I have dreamt it, or if I indeed read it, but I seem to remember that even if you use unbuffered IO, then the OS actually may create buffers for you, so your read operations are buffered anyway. But I don’t bet on this one, and I am deep into something at the moment, but I’ll eventually look it up, and come back to this.
True, when you read from disk there is no way you can physically read one byte. The smallest size that physically can be read from disk is the block size of the device. But as you mentioned there is layers on layer. The read done by the kernel asked from a process doesn’t know any of this and is clearly separated from it. For that reason we have UFS and we should consider this as the lowest and “physical” level.
True, we probably read one byte at a time towards that buffer, and may regard that reading as the lowest one. -I wasn’t actually thinking of that buffer, but you are totally right.
Here’s how I’d do it on my system using the Satimage.osax.
# Requires the Satimage.osax AppleScript Extension { http://tinyurl.com/dc3soh }.
set _file to alias ((path to home folder as text) & "test_directory:test")
set AppleScript's text item delimiters to "¶¶¶¶"
set {var1, var2} to text items of (find text "(?m)\\A([^\\n\\r]+)[\\n\\r](.+)\\Z" in _file using "\\1¶¶¶¶\\2" with regexp and string result)