In the sed vs awk (vs perl) you have to be aware that sed is the most fastest one and perl the most versatile. The reason I prefer awk is that it uses a bit of both: it’s readable, more versatile than sed and faster than perl, in fact for most regular expression handling/text processing AWK is only 25% slower than sed which isn’t much. AWK became so popular in the 80’s that people wrote almost complete programs in AWK while Kernighan (and his two co-developers) never intended it to be a complete programming language, you should use C instead. But don’t think that every awk is as good as any, I’m glad that Mac OS X ships with the one-true-awk version which is IMO the best awk version. The other awk version are sort of extended versions with more functions etc… but comes with a cost: less performance. There is an byte coded version of AWK which is 4 - 6 times faster than sed. Unfortunately it’s known to be very sensitive (read:buggy).
The reason I respond to this topic because the “benchmark” has to be very inacurate. First of all, both awk and sed will use of the total cpu needed of the code that’s been executed. To perform an more accurate “becnhmark” we need to get rid of most of the overhead:
- first put all the commands in a single shell script
- put ioreg results in a single command
I came with something like this:
return do shell script "#!/bin/sh
# time the overhead
dummy1=$(python -c 'import time; print time.time()')
dummy2=$(python -c 'import time; print time.time()')
# bash doesn't support floating point math so we're using awk (could use bc too)
overhead=$(awk '{print $1-$2}' <<< \"$dummy2 $dummy1\")
# now we only want to invoke ioreg command once to avoid more overhead
buffer=$(ioreg -c IOHIDSystem)
# setting the starttime
starttime=$(python -c 'import time; print time.time()')
# repeat the awk command 1000 times
for (( i=1; i<=1000; i++ ))
do
AWK -F = '/HIDIdleTime/{R=$2} END{print R}' <<< \"$buffer\" >/dev/null
done
# we're done; time again
endtime=$(python -c 'import time; print time.time()')
# subtract the time difference so we know the elapsed time but also subtract the overhead
awktime=$(awk '{print $1-$2-$3}' <<< \"$endtime $starttime $overhead\")
echo \"Time elapsed by awk: $awktime\"
# setting the starttime
starttime=$(python -c 'import time; print time.time()')
# repeat the sed command 1000 times
for (( i=1; i<=1000; i++ ))
do
sed -n '/HIDIdleTime/ h;${g;s/[^0-9]*//p;}' <<< \"$buffer\" >/dev/null
done
# we're done; time again
endtime=$(python -c 'import time; print time.time()')
# subtract the time difference so we know the elapsed time but also subtract the overhead
sedtime=$(awk '{print $1-$2-$3}' <<< \"$endtime $starttime $overhead\")
echo \"Time elapsed by sed: $sedtime\"
#just for fun show the difference in percentages
ratio=$(awk '{print ($1/$2-1)*100}' <<< \"$awktime $sedtime\")
echo \"Sed performed the task $ratio% better than awk\""
Keep in mind that still more than half of the code is overhead (communication between bash and commands etc…). I report a time difference ratio back because time in seconds is almost useless without having any information about the machine at all. The most remarkable thing that pops up on my machine after running the code about 20 times (to get a good indication) is not that sed is faster but the ratio differences between each run. On my machine the ratio is mostly 20 - 25% more speed from sed, but once in a while it seems that sed makes a jump in performance and become 50% faster while awk is steady in time each time it ran.
The 20-25% performance difference between sed and awk is confirmed again as I predicted. However the difference in time is almost completely useless because when we time the ioreg command, both awk and sed uses less than 20% of the total cpu time of the two commands. With the overhead of the do shell script command and coercion of variable will put both sed and awk in about 5% of the total cpu needed to complete the command in AppleScript. The 20% in performance difference of awk and sed (including all the surrounding code) is than set back to be around 1%.
conslusion
So in short sed is about 20% faster than awk, as expected and confirmed here. Because of the huge overhead of surrounding commands and AppleScript (including do shell script overhead) the overall winning for sed will be around 1%. So with such a marginally speed difference for this particular command it won’t matter which you should use, use to your most likings sed or awk. I prefer AWK, as I said before, because it’s syntax not that much of speed difference compared to sed (read:fast) and more versatile because of the built in functions. But, hey, that’s just my opinion. If performance is an real issue I prefer to write an command line utility myself, awk and sed can’t beat that.
edit: I’ve ran the script on an older MacBook Pro 8.2, 2.3 GHz Intel Quad Core i7 with 8GB 1333MHz DDR3 Memory using Mac OS X 10.6. I ran the code on my other machine with Mountain Lion which is an MacBook Pro 10.1, 2.6 GHz Intel Quad Core with 16 GB 1600MHz DDR3 Memory. The funny thing I found out that the awk command has the same execution time (would expect faster) but the weird thing is that on the newer machine the sed command is slower in time and the time difference in awk and sed is stable at 5%. I never knew that the performance of sed on newer machine are worse than older machine with older OS.
So I can image that kel1 is having a machine where awk even performs faster than sed. I’m surprised of the unstable performance of sed in the first place but also the unexpected performance difference on different machines. Another reason to like awk even more;)