sed help

Hi,

The following script gets text from a file and adds line numbers to each paragraph:

set cmd to "sed '=' <~/Desktop/WolfandCrane.txt | sed '
N
s/\\n/.	/
'"
do shell script cmd

That big space in the replace text is a tab. Can someone please tell me how to just call one sed command without the pipe, if possible. Here’s some text:

Thanks,
kel

Model: MBP
AppleScript: AS 2.2.4
Browser: Safari 536.28.10
Operating System: Mac OS X (10.8)

I think I found out how. The file input has to be at the end.

Edited: no that didn’t work. :confused:

Hi,

That’s ok. I think it can’t be done because of the line numbering. Think I’ve got the input and output from/to file part.

Thanks anyway,
kel

The equal sign “=” prints to standard output the line number. How can you capture that line number? I wonder if this would work:

to remember the line number. Gotta try it later.

Hello.

Here is a way to do it in one go with awk.

cat poem2 |awk 'BEGIN {i=1} {/    / gsub("        "," "); printf "%d %s",i++,$0 }'

The long spaces are of course tabs. There may be long spaces when you paste it into your editor. :slight_smile:

You have nice poems kel!

An awk script works pretty much like sed, but it has more goodies builtin. gsub, globally substitutes, and operates on the current line of input, when no third argument is given: (gsub(subst string, replace string, variable)) I also use one of the three sections BEGIN {} {} END {}, namely the BEGIN section, to initialize my linecounter, I perform everything in the middle “in stream” section, and has nothing to do in the END section. Ahh, and I post increment my linecounter i with the ++ after each printed part of the line, (There are no newline “\n” at the end of the format string of the printf function.

Edit

I’ll just state here, that I intend to come back to this on time in the future, and use the Thompson NFA algorithm on it, (I am going to optimize it, more than slightly really.) And when I have written it here, then I have to! :slight_smile:

Hi McUsr,

I tried this:

set f to choose file
set pp to POSIX path of f
set cmd to "cat " & pp & " |awk 'BEGIN {i=1} {/ / gsub(\" \",\" \"); printf \"%d %s\",i++,$0 }'"
do shell script cmd

→ "1 A WOLF who had a bone stuck in his throat hired a Crane, for a large sum, to put her head into his mouth and draw out the bone.2 When the Crane had extracted the bone and demanded the promised payment, the Wolf, grinning and grinding his teeth, exclaimed: …

It’s almost there.

Hi McUsr,

It’s a puzzle as to why the line numbers aren’t on new lines. Good one!

later,
kel

Hello

What is missing? Later, Brazil Italia…

The linefeeds are missing.

Hello.

Here you go, I actually thought you wanted to get rid them, but it was only the first wasn’t it?
I have added \n at the end, as \n represents a linefeed in the printf statement, in all printf statements as a matter of fact. Funny that you don’t have printf in Python by the way.
I have also added a 2 in the format of the digit (%2d), this should give you a space in front.

set cmd to "cat " & pp & " |awk 'BEGIN {i=1} {/ / gsub(\" \",\" \"); printf \"%2d %s\\n\",i++,$0 }'"

Sed is really elegant in its simplicity.

Absolutely, but when what you want to achieve, you’ll have to think more, than with awk, I think you have seen a lot of examples of sed by now, for counting stuff and the like. :slight_smile: You can also google for “sed Tower of Hanoi”. :slight_smile:

For such a case as the above, a read command and some Applescript prepending each paragraph with the correct number would be faster, lest we not forget that.

property pp : ""
property nl : {}
set pp to "A WOLF who had a bone stuck in his throat hired a Crane, for a large sum, to put her head into his mouth and draw out the bone.
When the Crane had extracted the bone and demanded the promised payment, the Wolf, grinning and grinding his teeth, exclaimed:
"Why, you have surely already had a sufficient recompense, in having been permitted to draw out your head in safety from the mouth and jaws of a wolf."
In serving the wicked, expect no reward, and be thankful if you escape injury for your pains."

set {oldTids, pcount, i, my nl} to {AppleScript's text item delimiters, count paragraphs of my pp, 1, {}}
repeat pcount times
	set {AppleScript's text item delimiters, thisL, AppleScript's text item delimiters} to {tab, text of (paragraph i of my pp), space}
	set {end of my nl, i} to {"" & (i & space & (thisL as text)), i + 1}
end repeat
set AppleScript's text item delimiters to oldTids
nl

This is a more “message oriented approach”, I think it looks better, and thinks it should perform as fast as the former.

property pp : ""
property ml : {}
property nl : {}
property thisL : ""
set pp to "A WOLF who had a bone stuck in his throat hired a Crane, for a large sum, to put her head into his mouth and draw out the bone.
When the Crane had extracted the bone and demanded the promised payment, the Wolf, grinning and grinding his teeth, exclaimed:
"Why, you have surely already had a sufficient recompense, in having been permitted to draw out your head in safety from the mouth and jaws of a wolf."
In serving the wicked, expect no reward, and be thankful if you escape injury for your pains."

set {oldTids, my ml, pcount, i, thisLine, my nl} to {AppleScript's text item delimiters, paragraphs of my pp, count paragraphs of my pp, 1, "", {}}
repeat pcount times
	tell (a reference to AppleScript's text item delimiters) to set contents to tab
	tell (a reference to thisL) to set contents to text items of paragraph i of my pp
	tell (a reference to AppleScript's text item delimiters) to set contents to space
	tell (a reference to thisL) to set end of my nl to i & space & contents as text
	set i to i + 1
end repeat
set AppleScript's text item delimiters to oldTids
nl

It’s one of Aesop’s Fables. I happened to read them myself a few days ago as they come as a freebie with the Kindle app. :slight_smile:

Edit: By the way, another alternative to using two seds would be to use grep in place of the first one, which would make the remaining one somewhat simpler:

set cmd to "grep -n '' <~/Desktop/WolfandCrane.txt | sed 's/:/.'$'\\t''/1'"
do shell script cmd

Hello.

Aesop was a smart guy, I like many of his fables well. Though I am sure I could have learned a little bit more from them.

I timed my two Applescript versions on a file I have downloaded from Gutenberg.org 46.txt “A Christmas Carol” by Charles Dickens, which contains 4238 lines of text, which I encoded to use Unix Linefeeds, and Mac Roman as encoding.

The second “message oriented approach” used 1.47 seconds of two runs, and the first one, which uses the list “trick” used 2.02 seconds on the second run.

Both files were read from disk, both scripts used 5 apple events each.

Edit

:smiley: kel’s version used 0.03 seconds.

The kindle sounds like a good idea. I’ll look into fixing up my old Psion organizer for reading as I am not buying anything at the moment as a matter of fact, I’ll have to work up enough money to buy a “black juice can”, that Apple are to sell soon.

--> {"1   A   W O L F   w h o   h a d   a   b o n e   s t u c k   i n   h i s   t h r o a t   h i r e d   a   C r a n e ,   f o r   a   l a r g e   s u m ,   t o   p u t   h e r   h e a d   i n t o   h i s   m o u t h   a n d   d r a w   o u t   t h e   b o n e .", "2   When the Crane had extracted the bone and demanded the promised payment, the Wolf, grinning and grinding his teeth, exclaimed:", "3   "Why, you have surely already had a sufficient recompense, in having been permitted to draw out your head in safety from the mouth and jaws of a wolf."", "4   In serving the wicked, expect no reward, and be thankful if you escape injury for your pains."}

It’s because the delimiters are still “” when you get the text items first time round the repeat. (The values in the right-hand list are all obtained first and THEN applied in order to the destinations in the left-hand list.) But why do you need to use delimiters for anything other than sticking the paragraphs back together?

Hello.

I use it both times to transform tabs into spaces. I am uncertain of using words or anything else, since I only want to loose the tabs.

Thanks for pointing that out, about the right list, I thought it happend on sight, and not beforehand.
But now I use text of a paragraph, instead of text items, or items, and the transform are done as I intended.

I correct the script above as soon as this post is submitted.

Have a nice Sunday Nigel! :slight_smile:

Edit

I always set the text item delimites to “” in the top of a script, for the case that I stop in between text item delimiters, so that I start with text item delimiters that are as intended on the next run, that line somehow slipped into the script I originally posted.

And as long as you have lines of text supplied without any read operation, and the number of lines be below say 80, I purport that Applescript be the fastest way to perform a simple transformation of the text, without any regular expression involved.

Here is a complete listing of timing results, using my 46.txt file, which is 4238 lines long

kel's original 0.03 seconds My awk version: 0.05 seconds Nigels grep/sed: 0.93 seconds My AS msg version: 1.46 seconds My AS list cmd ver: 1.79 seconds
Just for completeness.

What suprised me the most, was how well the AS message version really performed. :slight_smile:

:wink:

A young Squirrel went looking for acorns. Realising after a while that he was in an unfamiliar part of the forest and was hopelessly lost, he approached a group of wise old Owls whom he espied sitting in a nearby tree and addressed them thus:

“O sirs! Ye who are the wisest creatures in this forest. Forgive me that I disturb your deliberations, but I have lost my way home and do not know in which direction to turn. Alas, I have searched everywhere, but cannot find the answer. Tell me, I beseech ye: how may I reach the big sycamore by the meadow?”

“Welcome, young Squirrel, to our tree,” replied the Owls. “You’ve come to the right place.”

One of the Owls suggested the young Squirrel make his way to the forest on the other side of the meadow, whence he might find his way back using the sun for navigation.

“That’s all very well,” another Owl reminded the first. “But it would be dark by the time he got there. He would do better to start from the forest by the mountain and navigate his way back from there using the stars.”

“Speaking of stars,” replied the first Owl, “this is a scroll I’ve written describing how to get to the Pole Star. You are permitted to use it but not to change any part of it.”

While they were arguing about the best forest from which to start and the best navigation method to use, another Owl produced a scroll and said: “Here’s a set of directions which take you from here to the city on the other side of the river. You should be able to adapt it for your own purposes. It requires the installation of Fish gills.”

A younger Owl, anxious to give back to the community after having received much help from his elders, piped up: “If you go to your tree and lay a trail of breadcrumbs from it, you could follow them back to it!”

“Hello,” said the first Owl. “I’ve updated my scroll in reply #3 to give the user a choice between going to the Pole Star or to the Moon.”

Overwhelmed …

“Hello. I’ve corrected an error in my scroll.”

… to be offered so much wisdom and learning, the young Squirrel was about to bury his head in a pile of Hedgehog dung when he heard his mother calling from his home tree, which was only fifty yards away.

Experts can’t see the tree for the woods.

:slight_smile:

Much wisdom, and very appropriate here. I know I diverge, but it is usally following a sort of natural associative flow.

If I were totally disciplined, I 'wouldn’t be writing much here, but reading boring documents all day long. I always try to find articles, and it so happens, that you can’t always find something written by someone like Bill Cheeseman on a subject.

By the way:
I have actually read something useful in a book by John Pugh recently! (It is John, not Jon) An old book called Software Engeneering, a programming approach (Old, probably tossed from the Libraries by now). By reading the chapter on Smalltalk, I felt I had the bearings between Objective-C and Smalltalk. That indeed every method in a class is the class’s protocol. And that a selector, is indeed just the name for a message, without the arguments, a message, is the selector and the arguments, a message has also a receiver, of course, which would be either an instance, or a class, depending of the scoping of the message.

Now this intrugued me a little, since I guess Applescript plays along, or are written in Objective-C/Cocoa nowadays, from beforhand I have seen that some of the pro’s using tell blocks to a reference of globally scoped variables, so I wanted to check out a message oriented approach, and see how that worked out when iterating over something.

And it proved to work better than I’d expect, knowing that a list of assignements is indeed very fast.

And sorry for the digression, that is something I probably should stop doing, after having read the fable of Aesop!
:slight_smile:

Hi everybody,

After reading the tutorial at Greymoire:

http://www.grymoire.com/Unix/Sed.html#toc-uh-51b

I went back to try to figure out the line number ‘puzzle’. He states:

I can’t find the “next section” where he uses just one invocation of sed. This might be the section on the Bourne shell. Or, it might be something like Nigel’s grep + sed.

McUsr wrote:

It’s a good thing I didn’t quote from “The Bible”. :slight_smile: Downloaded the Bible from Project Gutenberg.

Thanks for the scripts. Now I can study awk with all the awk scripts you all have posted. Surprising timings on the different scripts, McUsr.

Thanks a lot,
kel