read command and bom

Hi,

If I ‘read’ a file and see no bom, does that mean there is no bom?

In a plist file, is the top part that identifies it as xml called a bom? If not what do you call that? Also, is there a way to create a plist file with that identifying part without writing it in (unix also)?

I’ve found some things on the internet. One was to use Xcode create a project and use the info.plist file. I know that I can copy and edit an existing plist file.

Edited: btw, I’m trying to write to a plist file in a .MacOSX folder in my home folder.

Thanks,
kel

Hello.

The BOM are just three bytes, that are interpreted by whatever reads the file, before you see it, unless you open the file as binary.

A tool like xxd would reveal the bom.

I get out of here, as Nigel can give a much better answer. :slight_smile:

Yes, I’m wondering if you can get rid of that 3 byte bom and if you should want to.

Hi kel.

A BOM (byte-order mark) is a two- or three-byte code at the beginning of a text file which indicates what kind of Unicode text the file contains ” if that’s what it contains. If you see two or three odd characters at the beginning of the text when reading from the beginning of the file ‘as string’, it contains a BOM, otherwise not. A BOM isn’t necessary if the software can safely make assumptions about what sort of text is in the file or is clever enough to deduce it.

I don’t think you need bother with a BOM in a plist file.

Hello.

If I have misunderstood it completely correct, a plist file, is a binary file, and not text, and hence uses no bom, but its XML counterpart, (when converted is a text file), but in the definition, preamble, or whatever; the declaration, it is stated what it is, so it shouldn’t need any BOM as well.

And for the record, you can’t make any script run with a bom, as when you see a #! /sheebang of some kind, the kernel sees the same as the two first bytes of a text file, that is a script. (The kernel takes care of that, but if there is a BOM infront of those bytes, then the kernel happily ignores the script file, since it isn’t the kernel’s responsibility to translate BOM’s.

You don’t need a BOM when saving in text format because property lists are XML and include the encoding declaration at the beginning: <?xml version="1.0" encoding="UTF-8"?>. I’m not sure, but I think XML standards say this declaration trumps any BOM. I don’t think Apple uses a BOM when saving in text format, although the presence of one shouldn’t cause any problems, either.

It can be either. There are three types: old-style OpenStep ASCII text files, which are read-only these days, binary files, and XML text files.

Hello.

I was just thinking of it yesterday, it has been a while since I saw an OpenStep propertylist file now. (And I don’t think that plutil supports converting to the format, but it should still manage to read them.)

Anyways, since propertylists, maybe a kind of metadata, I post this interesting article about file attributes :smiley:

Hello.

And just for the record. the PropertyList Editor that ships with Developer tools on Snow Leopard, doesn’t deal very well with acl’s.

I have tried for days to insert some properties for header files into XCode’s Info.plist file, but no matter how I have opened, with vim, or Propertylist Editor, it have failed. With BBEdit on the other hand, it works like a charm.
I think TextWrangler also manages to convert a binary Info.plist file on the fly, and let you edit them directly, without any fuzz! :slight_smile:

Hi everybody,

So, when you use the Standard Additions ‘read’ command and see no odd characters in a text file, then there is no bom. I assume it reads the text file as ascii. When I ‘read’ a TextEdit utf8 file, there is no bom. if that file were a script whose text begins with #! (i.e. the first two bytes), then the kernel continues on with that file. Otherwise, it ignores it. I guess I have to write the encoding declaration to a plist file when creating one. Apple should add something to the plist suite to ‘make’ plists like ‘with root’ or something. I wasn’t sure if Apple had changed the read/write commands, but it seems the same. Think I’m getting it now.

Editted: should have wrote that ‘read’ reads as text if you don’t use the ‘as’ parameter’. Still a little mixed up here.

Thanks a lot,
kel

Hello.

Apart from the #! byte sequence, the file containing it must be set to executable, and invoked via a system() call.

And I really think that the BOM’s doesn’t show up when you read the text as text with the read command at applescript. And if you are getting something, that either renders some characters wrong, or in chinese, then you have a wrong BOM for sure. the commandline utility iconv converts files from one encoding to the other for you, adjusting BOM’s as necessary.

For files accessible from the BSD/Darwin/Commandline side of OS X, the native filetype is Utf-8 NO Bom by the way, (with Unix LineFeeds).

TextWrangler is a remarkably good text editor, as it will discover, and inform you about the byte-order mark of the file, this is not something you’ll find by commandline Unix Text Editors by default. (Although I believe vim, at least supports it, and maybe emacs to, if given such a directive in the text file currently being edited.)

Correct.

In the absence of an as parameter, or using “as text”, it uses your system’s default encoding, so probably MacRoman in your case.

UTF8 files rarely use a BOM. BOM is a misnomer for UTF8 anyway. Because of the nature of UTF8 encoding, it’s usually easy to spot anyway. And most Mac apps these days also set extended attributes for encoding. Try xattr -l on a TextEdit file.

#! is a whole other issue – nothing to do with property list files.

It’s part of the first line of every property list file. Unless it’s simple, you shouldn’t be writing them with the read/write commands. Use dedicated tools, like System Events, a scripting addition, or Runner.

All you need is there now:

set theFile to choose file name default name "Untitled.plist" default location (path to desktop)
set theFile to POSIX path of theFile
tell application "System Events"
	set plistFile to make new property list file with properties {name:theFile} 
	tell plistFile
		make new property list item with properties {name:"member", kind:record, value:{first_name:"Jan", last_name:"Juc", age:38}}
	end tell
end tell

Hi Shane,

That’s exactly what I was trying to do. I made the property list file, but not the property list item. Without the root, no application can read the file.

Hi McUsr,

I’ve tried TextWrangler and it looks like a great app. And it’s FREE! I love free.

Edited: I must have made some other mistake with ‘make new property list file’ earlier. With your script, it creates the encoding and other info. It’s working now. Those other unix utilities you all mentioned looks very helpful also.

Edited: got it! You used the posix path as the name of the file.

Thanks a lot,
kel

Nope, the read as «class utf8» command will hide the BOM for you. So you’re right when you see the bom depends how the file is being read. When you read the file as text it will show the BOM

It’s called an XML Declaration and should be present in all XML files. It indicates how the file should be processed but another rule about XML Declaration is it must start at the first position of the file which means that when using an XML (based) file there you won’t find a BOM or better said you’re not allowed to use in BOM at all.

Hello

BOM’s can be funny, to find, when you don’t expect them, say you are testing for something in the first column and line of input, then you see the output, as you perceive it to be, but your script didn’t work. Now that is because the BOM’s doesn consist of printable characters, and takes up zero space on screen or in a rendered file.

By the way, here is a commandline tool for setting, listing, and testing the color labels of Finder Files. There is on Carbon, CoreFoundation, Foundation, or Cocoa in it. :slight_smile: It can be downloaded from here and source is included.

Edit
Arggh.

I had forgotten to incorporate the usage of a chosen delimiter for splitting between files, and colors/numbers, now fixed.

Edit++

Removed one more minor bug. Now it really should be stable. :slight_smile:

TIP: Open the file in 8-bit character encoding like CP1252 and when the document starts with “” there is a BOM.

Hello.

cat supsicious.txt |od -cb |head −1

works from the command line.

TextWrangler and BBEdit tells you automatically what encoding the file has. You can just read it on the status-bar at the bottom of the window.

By the way, I’d sure like one of these. :slight_smile:

Hi McUsr and DJ Bazzie Wazzie,

I was thinking about the invisible characters, so need to read the file as string and check if the first 3 characters contains invisible characters.

I’ll have to try the other utilities when I get back. Need to build a fence.

Neat time machine! :slight_smile:

Thanks a lot everybody,
kel

p.s. I ended up changing the “.bash_profile” back to what it was and the computer is still working. :slight_smile: I made this same mistake ten years ago. Lesson learned twice. :lol:

Or save your eyes and read as data…

Hi Shane,

I couldn’t remember how to deal with data so tried to count the text with this:

set f to choose file
set t1 to read f as string
--set a to count t1
--set t1 to (character id 0) & t1
--set b to count t1
--{a, b}
set t2 to read f as "utf8"
{count t1, count t2}

It seems to work invisible as well as funny characters. Only problem is you need to read twice.

Is there some way to look at the data and know that there’s a bom?