Unix files versus Mac Classic files and Grep

Hello again folks :slight_smile:
I’m back to pick your brains once again :wink:

I have created for test purposes, two xml files that are identical on my desktop, except that one is saved with Unix line feeds (character 10 - \n), and the other with “Classic Mac” returns (character 13 - \r). These files each contain the same xml :

<?xml version="1.0" encoding="utf-8"?>
    <track>
        <title>Do it with music</title>
        <creator>My Studios</creator>
        <artist>The singer 1</artist>
        <info>Best of studio version</info>
    </track>

    <track>
        <title>Music Rocks!</title>
        <creator>My Studios</creator>
        <artist>The singer 2</artist>
        <info>Best of studio version</info>
    </track>

    <track>
        <title>Music to my ears</title>
        <creator>My Studios</creator>
        <artist>The singer 3</artist>
        <info>Best of studio version</info>
    </track>

    <track>
        <title>Music put to music</title>
        <creator>My Studios</creator>
        <artist>The singer 4</artist>
        <info>Best of studio version</info>
    </track>

</trackList>

They were created with BBEdit for test purposes and the xml tags are examples, but that’s not the point !

If I run this three liner script :

set thePath to (POSIX path of (path to desktop folder) as text)

set theResult to do shell script "grep -s -h title " & quoted form of thePath & "*.xml"

display dialog theResult as string

I get the following in the resulting dialog :

        <title>Do it with music</title>
        <title>Music Rocks!</title>
        <title>Music to my ears</title>
        <title>Music put to music</title>
<?xml version="1.0" encoding="utf-8"?>
    <track>
        <title>Do it with music</title>
        <creator>My Studios</creator>
        <artist>The singer 1</artist>
        <info>Best of studio version</info>
    </track>

    <track>
        <title>Music Rocks!</title>
        <creator>My Studios</creator>
        <artist>The singer 2</artist>
        <info>Best of studio version</info>
    </track>

    <track>
        <title>Music to my ears</title>
        <creator>My Studios</creator>
        <artist>The singer 3</artist>
        <info>Best of studio version</info>        </track>

    <track>
        <title>Music put to music</title>
        <creator>My Studios</creator>
        <artist>The singer 4</artist>
        <info>Best of studio version</info>        </track>

</trackList>

The first four lines are from the xml file saved with Unix line feeds. The rest is more or less a copy of the content of the file with classic Mac return characters, apart from the fact that for some reason, the closing tag of the last two tracks were moved to the end of the previous line. This last quirk is not what’s important to me. What I really need is for the grep to return the same four lines for both Mac Classic and Unix files.
I really don’t see why return (\r) line endings should prevent the grep from just isolating the lines, rather than returning the whole xml content.
Sorry for such a long post for such a short script ! Just trying to be clear :slight_smile: Thanks in advance for your assistance.

Hi. You’ll have to convert to linefeed endings. If you want to use the wildcard faculty to locate files, I think you might have to call grep twice.

set thePath to (POSIX path of (path to desktop folder) as text)

do shell script "grep .* " & quoted form of thePath & "*.xml  | tr " & return & space & quote & linefeed & quote & " | grep title "

Hi,

I tested with your XML file 3 cases:

  1. XML file saved using Legacy Mac return break symbol (“\r”)
  2. XML file saved using Unix linefeed break symbol (“\n”)
  3. XML file saved using Windows return & linefeed break symbol (“\r\n”)

For testing I used simpler form of shell command to avoid effects from other tags:


set xmlFile to choose file of type "xml" default location (path to desktop folder from user domain)
--> alias "HARD_DISK:Users:123:Desktop:untitled text.xml"

set theResult to do shell script "grep title " & quoted form of POSIX path of xmlFile

The result of tests was: the grep command works fine with 2) and 3), and doesn’t recognize return symbol of case 1) as break symbol.

So, you have to convert return symbol (“\r”) to linefeed symbol (“\n”) or to return & linefeed break symbol (“\r\n”), as Marc Anthony sad.

NOTE:
In the case 2) and 3) the result is same (I see, you want universal result for Mac and Windows. So, it is universal):


"            <title>Do it with music</title" & return & "           <title>Music Rocks!</title>" & return & "            <title>Music to my ears</title>" & return & "            <title>Music put to music</title>"

Hi and thanks for your answer. I much suspected that was the case, but I couldn’t see a workaround !
I’ll have a go with your code and see how I get on. I’ll report back.

Hi and thanks for your reply. As you say, I would like a universal solution. I’m going to see if I can work something out with Mark Anthony’s code.
I’m not sure if you were actually proposing a solution yourself here… ?? What you are showing in your script let is indeed the result I would be expecting, but in my example case, I would have that twice, once for each file (3 times if you’ve added a Windows version!), as the aim is not to have an exclusive result, but a result from each file containing that information.

I put it wrong before that. In case 1) grep does not work. In cases 2) and 3), grep works and gives the same result. But it is not universal. For Windows, apparently all characters “\r” of the result will have to be replaced with “\r\n”, and for Mac all characters “\r” of the result will have to be replaced with “\n”

That is, in all 3 cases, you need to do a character “\r” replacement twice. First, inside the do shell command, then in the result (if you need to) replace “\r” with specialized ones for Windows or Mac (“\n” and “\r\n”).

So, the correct result (to do all the task once) should be using the without altering line endings, proposed by Nigel Garvey in other topic:
.

for Mac and Windows written XMLs, and to use result in Mac:


set thePath to (POSIX path of (path to desktop folder) as text)

do shell script "grep -s -h .* " & quoted form of thePath & "*.xml | tr " & return & space & quote & linefeed & quote & " | grep title " without altering line endings

.

for Mac and Windows written XMLs, and to use result in Wndows:


set thePath to (POSIX path of (path to desktop folder) as text)

do shell script "grep -s -h .* " & quoted form of thePath & "*.xml | tr " & return & space & quote & return & linefeed & quote & " | grep title " without altering line endings

If you don’t want to worry about line-breaks, you should treat the files as XML rather than text. For example:

use AppleScript version "2.5" -- macOS 10.11 or later
use framework "Foundation"
use scripting additions

set thePath to POSIX path of (path to desktop as text) & "Song info.xml"
set theURL to current application's NSURL's fileURLWithPath:thePath
set {theXMLDoc, theError} to current application's NSXMLDocument's alloc()'s initWithContentsOfURL:theURL options:0 |error|:(reference)
if theXMLDoc = missing value then error (theError's localizedDescription() as text)
-- parse for info
set {theMatches, theError} to (theXMLDoc's nodesForXPath:"//title" |error|:(reference))
if theMatches = missing value then error (theError's localizedDescription() as text)
set theResult to (theMatches's valueForKey:"XMLString") as list
(* -- or probably more useful:
set theResult to (theMatches's valueForKey:"stringValue") as list
*)

@KniazidisR

set thePath to (POSIX path of (path to desktop folder) as text)

do shell script "grep -s -h .* " & quoted form of thePath & "*.xml | tr " & return & space & quote & linefeed & quote & " | grep title " without altering line endings

This works perfectly for me. Thanks very much for your time and explanations :slight_smile:

Hi Shane and thanks very much for your most interesting answer. Your answers always generate food for thought for me :). I am most certainly going to play around with this and I will certainly have to start from a list of “identified” xml files, but I don’t think that would be a problem.
Time for testing… :slight_smile: