Converting hi ascii text to lo ascii

IamRob · December 10, 2023, 6:26am

I use emulators quite a bit and some of them require their text to be in hi ascii. To view the text on modern platform produces the hi-ascii gibberish. I wrote a somewhat simple applescript to handle this, but it quite often crashes. Reducing chunk sizes of text works mostly. But still Applescript crashes after handling 8kb, 4kb and even 2 kb chunks. I finally got tired of this and am imposing on you to correct my mistake. This is the script I am currently using. What is causing the crashing?

set para to null

tell application "TextEdit"
     tell document 1
         set cnt to count of character
         repeat with i from 1 to cnt
               set c to character i
               set var to (ascii number of c)
               if var > 127 then set var to var - 128
               set para to para & (ascii character var)
         end repeat
         set the clipboard to para as text
     end tell
end tell

beep

Nigel_Garvey · December 10, 2023, 10:27am

Hi @IamRob .

I’m not sure what you mean by “hi ascii” or whether it can be sensibly transliterated to “lo ascii” by the method you use.

With regard to your script code, three things to note are:

ASCII number and ASCII character were deprecated years ago. Their modern equivalents are id of … and character id … or string id …. These later versions have the nice ability to work on entire strings or lists of codes at once.
Since the initial value of your para variable is null, it’s final value after the concatenations will be a list containing null and the individual characters.
Working through the individual characters in the TextEdit document itself can take ages. It’s much to extract its text all at once and work from there.

Given my reservations about what you’re trying to do, the following would be more efficient.

-- Get the document text as AppleScript text.
tell application "TextEdit"
	set txt to text of document 1
end tell

-- Get a list of the text's UTF-16 character codes.
set theCodes to id of txt

-- Edit the codes in the list.
repeat with c in theCodes
	-- if (c > 127) then set c's contents to c - 128
	set c's contents to c mod 128
end repeat

set the clipboard to (string id theCodes)

However, I think you probably want to convert AppleScript UTF-16 text to strict 7-bit ASCII characters without worrying too much about the fate of diacriticals and exotic characters. In this case, you’ll need to use ASObjC to make use of the facilities offered by macOS’s Foundation framework. The final text is still UTF-16, but with all-lo-end characters.

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

-- Get the document text as AppleScript text.
tell application "TextEdit" to set txt to text of document 1
-- Then as an NSString.
set str to current application's class "NSString"'s stringWithString:(txt)
-- Get NSData represent the string as string 7-bit ASCII code.
set targetEncoding to current application's NSASCIIStringEncoding
set ASCIIData to str's dataUsingEncoding:(targetEncoding) allowLossyConversion:(true)
-- Get a new NSString derived from that data
set newString to current application's class "NSString"'s alloc()'s initWithData:(ASCIIData) encoding:(targetEncoding)
-- Convert back to AS text and place on the clipboard.
set the clipboard to (newString as text)

IamRob · December 10, 2023, 6:16pm

I apologize. I should have clarified. I am using Applescript 2.4.3 under MacOSX 10.7.

Although I have a newer MacOSX, the older computer is the one I mostly use for the emulators I am using. I don’t know what encoding is used when I drag a text document out of an emulator that only uses ascii characters $00-$FF. But yes, lo-ascii being in the range $00-7F and hi-ascii being $80-FF.

When I duplicate and try to save the document, it comes up as Encoding: Western (Mac OS Roman)
Is there a way to see what a text files’ original encoding is?

I will keep your second script suggestion for the newer computer for another day.
And your first script doesn’t work for me either. The text file that comes out of the emulator might be saved in UTF-8 encoding, if that helps.

Nigel_Garvey · December 10, 2023, 9:17pm

Hmm. I’m at a bit of a loss at the moment. AppleScript text became UTF-16 by default with Mac OS X 10.5 and the id of … and string id commands were introduced then. If you’re using AppleScript to get the text from TextEdit, it should be returned with UTF-16 encoding regardless of its encoding in the file from which TextEdit read it and my first script should produce exactly the same characters as your script, but as a single text and hopefully more efficiently and without crashing.

How does “doesn’t work for me” manifest itself? What are you hoping to see when non-ASCII characters are stripped of their hi bits?

IamRob · December 10, 2023, 11:36pm

They are source files from old assemblers. Basically it is english text.

I made some changes to my original script to get the text as a single text body. This doesn’t crash for me and is much faster.

set para to ""

tell application "TextEdit"
    set txt to text of document 1
end tell

repeat with i in txt
       set c to (ascii number of i)
       if c > 127 then set c to c - 128
       set para to para & (ascii character c)
 end repeat

set the clipboard to para as text

IamRob · December 10, 2023, 11:57pm

I will get you to confirm that I typed your script in correctly. This compiles correctly with no errors.

tell application "TextEdit"
    set txt to text of document 1
end tell

set theCodes to id of txt

repeat with c in theCodes
     set c's contents to c mod 128
end repeat

set the clipboard to (string id theCodes)

Nigel_Garvey · December 11, 2023, 1:59pm

Hi IamRob.

I can confirm that your script in post #6 is the same as my first one in post #2.

But if your modified script in post #5 is now doing what you want without crashing and without taking all day over it, it may be a good idea to stick with that rather than worry about trying to get my version to work. The essential difference between them is that mine leaves the values of the lo four bits of every character code encountered, whereas yours subtracts 128 from codes > 127. With initial codes up to and including 255 (the maximum possible with 8-bit values), the results are exactly the same. With codes higher than that, your version still leaves values > 127. If the results are what you want, I should go with that. Without knowing what the “gibberish” characters represent, whether they appear as such in TextEdit or only in the AppleScript text extracted from it, and what you want in their place, I don’t think I can offer any further suggestions.

IamRob · December 11, 2023, 2:11pm

Thanks for your reply Nigel.

You helped by showing me the allText way of doing it.

Just for reference, “set c to c mod 128” does work in my script instead of “set c to c - 128”, so it works exactly as you mention. Which probably means that the “id” commands maybe are not fully implemented in Applescript 2.4 yet.

Hallenstal · December 14, 2023, 9:08am

Do you really want to translate via ascii value - 127. e.g. this may mean that “Ä” becomes NUL character.
if you have the file, the file is most probably stored in UTF-8 then you can use terminal with the following:

iconv -f UTF-8 -t US-ASCII inputfile.txt > outputfile.txt

or if you simply want to just do the -127 for “hi-ascii” then you can use tr

cat inputfile.txt | tr "\#80-\#FF" "\#00-\#7F" > outputfile.txt

BR

Hallenstal · December 14, 2023, 12:46pm

I think the reason for your original script using ascii number etc, did not work was because you have characters that have a code > 255 (UTF-8 is used in textedit. )Anyway here is a script using shell.

tell application "TextEdit"
	set txt to text of document 1
end tell
set the clipboard to (do shell script "export LC_CTYPE=\"UTF-8\";echo " & quoted form of txt & "|iconv -f UTF-8 -t US-ASCII")