Sed do shell script

I’m stumped. Can’t get the result of one sed into the other.


set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the story.
123 def"
set cmd to "echo " & t & " | sed -n '/123/ =' | \\
sed 's/\\n/,/g'" -- ???
(do shell script cmd)

It’s supposed to end up with: “2,4”

I tried the curly brackets and everything I could think of according to the tutorial at Grymoire.

Hi kel1.


set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the story.
123 def"
set cmd to "echo " & t & " | sed -n '/123/ =' |
sed -n '1 h; 1 !H ; $ { g ; s/\\n/,/g ; p ; }'"
(do shell script cmd)

Hi Nigel,

It works. Now, off to find out why?

One thing that’s bothering me is that, why didn’t my script work. Oh, I just got it I think. It has something to do with input/output. That’s why the $.

Thanks a lot,

Hi kel1.

If it doesn’t make sense, here’s a break-down:


set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the story.
123 def"
set cmd to "echo " & t & " | sed -n '/123/ =' |
sed -n '# This sed command handles the linefeed-separated line numbers output by the previous one.
# sed handles its input a line at a time. We need it to edit the whole text at once, so gather the all lines into the hold space and edit and print them as a single unit at the end.
# Put the first line into the hold space.
1 h
# Append every line which isn't the first to the hold space with a linefeed.
1 !H
# At the last line, additionally .
$ {
	# . get the assembled text back into the pattern space .
	g
	# . replace all the linefeeds with commas .
	s/\\n/,/g
	# . and print the edited text.
	p
}'"
(do shell script cmd)

Wow, that’s much more easier to see.

Thanks,
kel

Just another way to skin this cat:

do shell script "awk '/123/{if (length(result) != 0){printf result\",\"}result=NR}END {printf result}' <<< " & t

In an more uncompressed way, same code here:


do shell script "awk '/123/{
	if (length(result) != 0)
		printf result\",\"
	result=NR
}END {printf result}' <<< " & t

What it does, the code between the curly braces will only be executed when the line contains ‘123’. The line number (NR) is stored in the buffer named result. The code between the curly braces after the END keyword is executed when the given data is processes. Because we have at least one item in out buffer left which needs to be printed without the comma at the end.

It’s not an worse or better solution than Nigel’s, just another solution.

edit:
An fun version using osascript:

do shell script "echo \"return every paragraph of \\\"$(awk '/123/{print NR}' <<< " & t & ")\"\\\" | osascript"

note: there is trailing space behind the commas

There are as many ways to Rome that there is opinons about sed.

Well, I’d first say that Nigels solution, is the fastest, in comparison to awk, but when speed really matters, awk may in many situations relieve you of forking subshell after subshell to execute commands in, and thereby end up faster. Since you can do so much with it! :slight_smile:

Here is my take, which is different, because I use the tools that are easier to use. I use tr and echo to create the final result.

set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the story.
123 def"
set cmd to "a=$(echo " & t & " | sed -n '/123/ =') ; echo $a |tr ' ' ',' "

(do shell script cmd)
”> "2,4"
 

Here is another variation, that uses nl for line numbering.

set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the story.
123 def"
set cmd to "a=$(echo " & t & " |nl -b a  |sed -nE '/123/ s/^( *)([[:digit:]]+)(.*)/\\2/p') ; echo $a |tr ' ' ',' "

(do shell script cmd)

Here is a more pure sed version, that I believe only will work with a pair of matches.

set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the story.
123 def"
set cmd to "echo " & t & " |nl -b a |sed -nEe '/123/ s/^[^0-9]+([0-9]+).*/\\1/p' | sed -e 'N'   -ne 's/\\n/,/p'"

(do shell script cmd)

A final variation, before I finally get buried into the boring book again!

I have tried to make it as easy as possible, and reducing the number of system calls (exec), I don’t think it comes close to speed with Nigel’s, but it is far easier to write on the fly! (Unless you are Nigel. :slight_smile: ).

set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the story.
123 def"
set cmd to " echo $(grep -n 123  <<<" & t & " |sed -n 's/\\([0-9]*\\):.*/\\1/p') |tr ' ' ','"

(do shell script cmd)

What can I say, it’s a boring day! :slight_smile:

Hello.

Here is yet another solution to kel’s original post, more in line. It turns out that ‘N’ in front of his last statement suffices.

In the “MacMahon” document, it is stated, that all output is given over to the next statement. (I haven’t dared/tried to coalesce it into curly brackets but operate with two statements on the command line.)
I leverage upon that statement, an the fact that “N”, doesn’t give any output, so the “N” statement will “block” further statements, until there are no more lines to append to the pattern space, then kel’s last statement replaces all the newlines, but the last one in the pattern space, and the script is done.

set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the story.
123 def"
set cmd to "echo " & t & " | sed -n '/123/ =' | \\
sed -e 'N' -e 's/\\n/,/g'" -- ???
(do shell script cmd)
”> "2,4"

Edit

My above understanding of it was wrong, come to think about it, it is that n/N starts a new instruction cycle, that is why it “blocks” or Slurps the input, not that there isn’t any data to pass on to the next sed commands, it is just because the N instruction starts a new instruction cycle. There I said it! :slight_smile:

Yeah. These two only work with the exact text provided. If you put a “123” in the third line as well, only “2,3” is returned.

n and N don’t actually start a new cycle, but draw the next waiting line into the rest of current cycle. That line does not then get a cycle of its own, but changes the current line number. You can have more than one n or N in the same cycle.

Edit: eg. Cycle within the cycle:

set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the 123story.
123  def
123"
set cmd to "echo " & t & " | sed -n '/123/ =' |
sed  ':top
$ !N
s/\\n/,/
t top'"
(do shell script cmd)
--> "2,3,4,5"

. or perhaps better written:

set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the 123story.
123  def
123"
set cmd to "echo " & t & " | sed -n '/123/ =' |
sed  ':top
$ !{
	N
	s/\\n/,/
	t top
}'"
(do shell script cmd)
--> "2,3,4,5"

. or even:

set t to quoted form of "I begin the essay with a thought
abc 123 xyz
about the end of the 123story.
123  def
123"
set cmd to "echo " & t & " | sed -n '/123/ =' |
sed  ':top
$ !{
	N
	b top
}
s/\\n/,/g'"
(do shell script cmd)
--> "2,3,4,5"

Hello.

I just figured it out, and this reaches Monthy Pythonically Heights (I got it, I got it. I don’t got it.).

Yes. The "n"operator gets the next line of input, *not starting a new cycle, and the N appends it to the pattern space., before retrieving a new line of input. (I have to rethink some more about this, as to why one of the earlier lines didn’t trigger then.)
This was what I came up with, that doesn’t find the lines, but creates a comma separated list of variable length.

Anyways, here is a one liner, that transform the lines in a list, into comma separated items.

sed  -e ':s;H;N'  -e 's/\n/,/g ;t s'

As I understand it,I allways copy the patternspace over to the hold space, which I believe is empty, for starters, then it appends the current line to the pattern space, the substitute is performed, the t command tests the substitute, and if there was a substitution, it branches back to top, appending current patternspace to the hold space, before appending next line of input to the pattern space, and the substitution is tested, and branched to, until there are no more lines of input, then a default print command is issued, as the branch failed.

(For some reason, the last single quote gets to interpreted as a linefeed, so this branching approach, is a Terminal/Shellscript only. But I see no reason why it wouldn’t work between some braces. Talk about many ways to skin a cat! :slight_smile: )

Edit
This works with GNU sed, and not with the regular one, the regular one, won´t accept the label in the next expression, and the whole thing fails in applescript when converted with braces.

Hi McUsrII.

It doesn’t work (in Snow Leopard, at least) because the built-in version of sed doesn’t like anything except a linefeed to follow a label. To make it a one-liner in a shell script, you have to do something like this, with a shell-supplied linefeed character:

"sed  ':s'$'\\n''$ !N ; s/\\n/,/g ;t s'"

The H in your code is entirely superflous, since you don’t move the held text back into the pattern space for editing/output. Also, although the hold space is empty at the beginning, H appends lines to it with a linefeed, so if you use H instead of h on the first line, you’ll get a linefeed at the beginning of the held text.

Hello.

I tried with and without the H, with the GNU sed, (which is a “customized” of our regular sed), to no avail, it had to be like that in order to work.

It was when I had mended the applescript I realized I had been using GNU sed, and that the script didn’t work at all with /usr/bin/sed. But, how funny it may seem, with resepect to the manual and all that, it actually do work with GNU sed! Obviously I didn’t test it with one line of input, but it worked with 2,3 and 4.

By the way, I figured out which command it was that short circuits and starts a new cycle, and that is the d/D command, (not much point of continuing, when you haven’t got anything in the pattern-space.

Enough said! (Pun intended!) :smiley:

It’s slowly becoming clearer. I’ll have to read these post and the tutorial another 5 to 100 times just like I did the AppleScriptLanguageGuide. :smiley:

Thanks a lot!

I found my old script that changes newline to comma. What I can’t see is why I couldn’t pipe the line number output to the other sed command. Doesn’t the pipe send the line numbers to a new process like in my original post?


set t to quoted form of "10 ab 123 cde 4910 123"
do shell script "echo " & t & " | sed -e 's/ /\\
/g' \\
-e 's/\\n/,/g'"

I know I could have substituted the space for comma, but was trying to learn the syntax with newlines at the time. Then, it didn’t work with the line numbering ‘=’, so I tried to pipe to a new process. I named this script SedSpaceToNewlineToComma.

Looking more closely at McUsrII’s repeated -e options, it appears that -e in a sed script acts like a shell-supplied linefeed, so another form of my one-liner would be:

"sed -e ':s' -e '$ !N ; s/\\n/,/g ;t s'"

It even works in the middle of a command! Here are three different ways to signify a linefeed in an s replacement regex:

set t to quoted form of "Hello world"
-- Replace the space with a linefeed.
do shell script ("echo " & t & " | sed  's/ /\\'$'\\n''/'") -- Shell linefeed.
-- Or:
do shell script ("echo " & t & " | sed  -e 's/ /\\' -e '/'") -- sed -e
-- Or of course:
do shell script ("echo " & t & " | sed  's/ /\\
/'") -- Literal linefeed.

Hi kel.

I don’t understand what you’re asking here.

If you mean why didn’t your original script (at the top of this thread) work, it’s because sed deals individually with the lines between the linefeeds in its input, sticking everything back together with linefeeds in its output. The second ‘sed’ in your script was trying to edit the linefeeds themselves, which weren’t in the patterns being edited.

Hello.

The trouble I see with the -e’s is that either they must come in an odd/even number to work, or it works for you because you have returns as the default line ending in your script editor, there is “something” with the -e’s that leads to trouble in the do shell script, so I am personally all for braces! :smiley: