Trying to convert a bunch of files to UTF-8

Shane_Stanley · January 4, 2015, 2:05am

Right. But NSDocument doesn’t have anything to do with the encoding or reading of files. It gives you a URL, and leaves it to you.

McUsrII · January 4, 2015, 2:39am

Hello.

Actually, NSDocument’s method readFromUrl is usually overridden, when you want to do some “special stuff” like read in a file using an encoding.

The method I described some posts above, about reading in the first line, and using that line for a guess of encoding, makes much sense, when you read the “Reading data with an unknown encoding” in String Programming Guide. Here it is a conseptual method, as it seems to me that CFString is really used to do the blunt work, when I perused the code of TextEdit.

Edit

Actually the whole file may be read in using one of the legacy encodings, as long as any multibyte characters, doesn’t make the read operation fail, the file will be seen as encoded correctly. Humans, however, will see that some characters are rendered badly, which is why we have to use xattr, to force the correct encoding.

DJ_Bazzie_Wazzie · January 5, 2015, 10:55am

To me an app that relies on file attributes that contain arbitrary data is the buggy one. An app that completely ignores it and will use it at most for user information sounds to me more in line with safe programming.

Shane_Stanley · January 5, 2015, 11:22am

So you think an app that changes the encoding and leaves the extended attributes as they are is not buggy?

DJ_Bazzie_Wazzie · January 5, 2015, 12:48pm

Exactly, it’s not the an app’s job to clean up arbitrary meta data of another application. Therefore spotlight uses spider processes to do that for you. Apple isn’t clear about it either, when copying a file using the cp command it will copy the extended attributes with it while the Finder doesn’t.

Extended attributes are designed to do one thing: Store application specific features and attach it to a file. Like labels and tags of a file or view settings of a file. You don’t store information about the content of the file and use it to determine it’s content, when you do the application becomes immediately buggy.

Shane_Stanley · January 5, 2015, 12:55pm

Then why do NSString’s file writing methods set the encoding extended attribute? That’s Foundation, not an app.

DJ_Bazzie_Wazzie · January 5, 2015, 1:38pm

To be able to use it with initWithContentsOf.:usedEncoding methods. Methods I would only advice to use on iOS not Mac OS.

Shane_Stanley · January 5, 2015, 10:49pm

We can agree to disagree. I will point out, though, that the new Yosemite API I used in the script earlier on doesn’t use extended attributes.

McUsrII · January 6, 2015, 11:38am

Hello.

I want to say that text encoding to me seems to work very well as it is, for me at least, when I have set it up to detect encoding autmoatically. It works a lot better than it used to do with Tiger anyways.

Now, I have to agree with DJ Bazzie Wazzie if we are talking about all kinds of files, but, if we are only talking about files that contains natural text, then I’d agree with Shane.

This is because of the fact that some compilers still doesn’t play well with utf-8. And, if you have lots of files to compile, then it make take longer time to compile utf-8, than MacRoman encoding.

I am perfectly happy with how Apples policy on textfile encoding, after I learnt that I could fix things with xattr. When something has gone wrong. What has made it for me, is setting up TextEdit to automatically detect encoding, which means that the text I write is taken for utf-8 most of the time at least. This also makes for files written from within Terminal.app, also looks good with my language specific accented characters, when viewed with QuickLook or FileMerge.

I prefer to convert files manually, at least until I see that the result is right, because it is the visual inspection here that matters most. If a file runs flawlessly thru some conversion, because of no ‘byte collisions’ or such, doesn’t by any means mean that it has gotten the right characters. This is a matter of presentation, more than correct data really.

Shane_Stanley · January 6, 2015, 11:42am

Right. All the discovery systems are based on shaky foundations – it’s amazing that something so basic as text files is so badly designed, regardless of the platform.

DJ_Bazzie_Wazzie · January 6, 2015, 12:20pm

I Agree