Encoding special characters in XML/HTML

I’m looking for an easy way to encode special characters (such as é or ©). I found this http://macscripter.net/viewtopic.php?id=38279 which talks about NSAttributedString but it doesn’t give much detail. Can anyone elaborate a little on how I might go about using this Class? Otherwise, is there a way to do it with NSXMLElement or NSXMLDocument?

The reason I ask is I’d like to replace just one element in an XML-like document. I say XML-like because I have been unable to edit it using NSXMLDocument. That is probably because it is not a typical XML. It is a project file for Apple Motion. My work around was simply reading the file as NSString then creating an element using NSXMLElement, XMLStringWithOptions and NSXMLNodePrettyPrint. Then I simply use stringByReplacingOccurrencesOfString to replace the element. Using this method I can escape structures that conflict with XML format (such as <>), but it does nothing with special characters.

My understanding is that I need to replace the special characters with their respective entity codes. For example é would be replaced by its entity code (I can’t use the entity code because this message board interprets it as HTML but a full list is found here http://www.freeformatter.com/html-entities.html). I’ve confirmed that manually putting in the entity code solves the problem.

Is there a method for encoding using entity codes?

If you want a solution that works in all versions of the OS, you can use my BridgePlus script library, or just use its framework. Using the library:

use framework "Foundation"
use script "BridgePlus"

load framework
set theString to (current application's SMSForder's encodedXMLFrom:"thisé & <that>") as text
set theString to (current application's SMSForder's encodedHexFrom:"thisé & <that>") as text

For 10.11 only, you can continue to use stringByReplacingOccurrencesOfString to replace the XML entities, and then use:

set theString to (anNSString's stringByApplyingTransform:(current application's NSStringTransformToXMLHex) |reverse|:false) as text

or:

set theString to (anNSString's stringByApplyingTransform:"[^\\x20-\\x7e]-Hex/XML" |reverse|:false) as text

That script is amazing and it seems to work fine. But I’m still getting the same problem when I output the file.


Is the problem with how I’m saving the file?

Here’s what I have:

on writeMotionFile(theFileManager, templatePath, destinationFile, thisSource)
	set thisSourceEntityEncoded to current application's SMSForder's encodedXMLFrom:thisSource
	set theTemplate to current application's NSString's stringWithContentsOfFile:templatePath encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set theTemplate to theTemplate's stringByReplacingOccurrencesOfString:"[insert text here]" withString:thisSourceEntityEncoded
	theTemplate's writeToFile:destinationFile atomically:true
end writeMotionFile

Is NSUTF8StringEncoding what I want?

The encodedXMLFrom: method just encodes the reserved XML characters: <>&, etc. Try using encodedHexFrom: instead.

Nice. That completely solves the problem. Thanks!

In case you are wondering, this project accomplishes what I set out to do in this post http://macscripter.net/viewtopic.php?id=43801 back in April. By manipulating both clipboard data and the original Apple Motion template files, I can automate all sorts of things in Final Cut Pro X.

Thanks for all the help.

That’s been a long haul – glad you got there in the end!