remove strings from a paragraph

I run a dj website and at the moment i have a fair few users i want to add into a list. the portal system i use stores the information in the source code of the webpage and i want to split it up and get out the certain users details i need. i currently download the users file once a week to update them on my local system and have been mainly updating the users locally by hand… but the list of users and users details is getting quite out of hand and i would like to know if there is a way that i can get their details out of a string and put them into a list…

for example the current string i have is this…


and i want to turn it this…

Hi,

probably there are a few solutions, here is one:

set a to "<br>Nickname: Sindustrial_666<br>Email: Sindustrial_666@DomainName.Com<br>Genre: Hardstyle<br>"

set {TID, text item delimiters} to {text item delimiters, "<br>"}
set b to text items of a
set text item delimiters to TID
set c to {}
repeat with i in b
	try
		copy text ((offset of ": " in i) + 2) thru -1 of i to end of c
	end try
end repeat

not quite what i am looking for… the problem with this solution is that the string is contained inside of a webpage and there are a lot more
tags occuring before it… and a lot more text

a simple example:

This will get you started - it breaks out the relevant lines. You’ll still have to parse out the "Nickname: ", "Email: ", and "Genre " from tParts, which I didn’t do here.


set tSource to "<html>
<body bgcolor=\"#FFFFFF\" marginheight=0 topmargin=0>
<table>
<tr>
<td>
<br><h1>Current Member Details</h1>
<br>
<br>Nickname: Sindustrial_111<br>Email: Sindustrial_111@DomainName1.Com<br>Genre: Hardstyle<br>
<br>Nickname: Sindustrial_222<br>Email: Sindustrial_222@DomainName2.Com<br>Genre: Trance<br>
<br>Nickname: Sindustrial_333<br>Email: Sindustrial_333@DomainName3.Com<br>Genre: Tech Trance<br>
<br>Nickname: Sindustrial_444<br>Email: Sindustrial_444@DomainName4.Com<br>Genre: Hardstyle<br>
</td>
</tr>
</table>"

set tParts to {}
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "Member Details</h1>"
set tSource to paragraphs 3 thru -1 of text item 2 of tSource
repeat with P in tSource
	set AppleScript's text item delimiters to "<br>"
	tell text items of contents of P to if (count it) > 1 then
		set end of tParts to items 2 thru -2 of it
	end if
	set AppleScript's text item delimiters to tid
end repeat

Here’s the full meal deal (at least for the data you gave):



set tSource to "<html>
<body bgcolor=\"#FFFFFF\" marginheight=0 topmargin=0>
<table>
<tr>
<td>
<br><h1>Current Member Details</h1>
<br>
<br>Nickname: Sindustrial_111<br>Email: Sindustrial_111@DomainName1.Com<br>Genre: Hardstyle<br>
<br>Nickname: Sindustrial_222<br>Email: Sindustrial_222@DomainName2.Com<br>Genre: Trance<br>
<br>Nickname: Sindustrial_333<br>Email: Sindustrial_333@DomainName3.Com<br>Genre: Tech Trance<br>
<br>Nickname: Sindustrial_444<br>Email: Sindustrial_444@DomainName4.Com<br>Genre: Hardstyle<br>
</td>
</tr>
</table>"

set tData to {}
set tParts to {}
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "Member Details</h1>"
set tSource to paragraphs 3 thru -1 of text item 2 of tSource
repeat with P in tSource
	set AppleScript's text item delimiters to "<br>"
	tell text items of contents of P to if (count it) > 1 then
		set end of tParts to items 2 thru -2 of it
	end if
	set AppleScript's text item delimiters to tid
end repeat

repeat with aPart in tParts
	tell aPart
		set N to text 10 thru -1 of item 1
		set E to text 7 thru -1 of item 2
		set G to text 7 thru -1 of item 3
	end tell
	set tData's end to {N} & {E} & {G}
end repeat