Problem with reg exp

Hi I can’t get the line:


	set p_list to paragraphs of (do shell script "curl -s http://epguides.com/24/ | grep '^[ ]*[0-9]\+\.' | sed -e 's/<[^>]*>//g' | tail -12 | cut -c1-13,28-")
	

To work properly.
I get an error saying that the + in the grep part is unexpected.
Anyone?

Hi Laner,

The “+” is not used in regular expression, but in extended. I think it’s erroring because you’re escaping soemthing that doesn’t need excaping.

gl,

Aha!
I need to match lines starting with a space, and a number with 2 or 3 digets followed by a priod.
Like " 34." or " 141."
How do I do this without using extended reg exp?

Hi Laner,

You can use extended regexp with the E option.

grep -E …

gl,

Get syntax error.
“Expected “”” but found unknown token"
Wih the line:


	set p_list to paragraphs of (do shell script "curl -s http://epguides.com/24/ | grep -E '^[ ]*[0-9]\+\.' | sed -e 's/<[^>]*>//g' | tail -12 | cut -c1-13,28-")

When you use "" to escape in a do shell script, you need to double it “\”. You’re escaping the escape for AppleScript so it takes the escape literally in the shell script.

gl,

Here’s an example of non-extended.


set regexp to "^ [ ]*[^0-9]\\{0,\\}[0-9]\\{2,3\\}[^0-9]\\{0,\\}\\."
do shell script "sed -n '/" & regexp & "/ p' <~/test.txt"

→ " 123.
→ 12. hij"

That’s the result of this text in text file test.txt:

  1. hij

Here I used input from test.txt. sed searches for lines that matches the regular expression and prints them. You see the \{2,3\}, that says a minimum of 2 and maximum of 3 digits, but you have to set the lower and upper limits as 0 or more non digits.

Edited: the spaces didn’t show up in the post, so i replaced them with dashes:

-123.
-12. hij
12.
–1234.

gl,

Oops. I made a mistake. You don’t want something like:

—123abc.

So the “.” has to follow the digits.


set regexp to "^ [ ]*[^0-9]\\{0,\\}[0-9]\\{2,3\\}\\."
do shell script "sed -n '/" & regexp & "/ p' <~/test.txt"

gl,

Hi Laner,

I went to look at the site again. You probably want something like this:


set regexp to "^[ ]*[^0-9a-zA-Z]\\{0,\\}[0-9]\\{2,3\\}\\."
do shell script "sed -n '/" & regexp & "/ p' <~/test.txt"

I don’t think the three digit numbers begin with space. Plus add not alpha just in case.

gl,

I really need it in one line.
Is that possible?

A quick look at the page looks like you have it about right, although I don’t know what you’re trying to get to. You know that grep is basic regex without the -E. And you just need to double up on the \. Don’t use my example because I haven’t studied the page you’re searching for something. I was mainly trying to show you how you need to use double \ and looking at the page, it likes like you don’t need to use the 2 to three digits you mentioned anyway. Keep at it.

gl,

This is the whole script:


property theMonths : "JanFebMarAprMayJunJulAugSepOctNovDec"
set ShowList to {"PrisonBreak", "HowIMetYourMother", "Lost", "Heroes", "Medium", "GreysAnatomy", "24", "BattlestarGalactica"}

set showDates to ""
repeat with i in ShowList
	set showDates to showDates & get_show_data(i)
end repeat
-- display dialog showDates

on get_show_data(show)
	set p_list to paragraphs of (do shell script "curl -s [url=http://epguides.com/]http://epguides.com/"[/url] & show & "/ | grep '^ [0-9][0-9]' | sed -e 's/<[^>]*>//g' | tail -12 | cut -c1-13,28-")
	tell (current date) to set cd to it - (its time)
	set target_dates to show & return
	repeat with this_p in p_list
		set w to words of this_p
		set item 5 of w to ((offset of (item 5 of w) in theMonths) div 3 + 1)
		copy cd to d
		set day of d to item 4 of w as integer
		set month of d to item 5 of w
		set year of d to (item 6 of w as integer) + 2000
		if d < cd then
			set target_dates to show & return & " " & contents of this_p & return
		else
			set target_dates to target_dates & "*" & contents of this_p & return & return
			exit repeat
		end if
	end repeat
	return target_dates
end get_show_data

Stefan made it.
I’m trying to get the previouse ad next air time of my tv shows.
But 24 has over a 100 episodes, so the script fails when it tries to check that one. the outputed episode isn’t the right one.