I referred to some scripts in this forum which involved downloading a newspaper and I wrote one for myself.
tell me to activate
set thedisplay to display dialog "Enter the articles range" default answer ""
--dialog box will collect the article number to start with for each page and maximum articles to be downloaded for each page
--for example, specifying "1,8" in the dialog box will download first 8 articles on each of the 24 pages
set theresult to the text returned of thedisplay
set AppleScript's text item delimiters to ","
set thestarting to the first text item of theresult
set maxarticles to the second text item of theresult
property maxpages : 24
property destfolderpath : "5:ET:"
set theprefix to "http://epaper.timesofindia.com/Default/Layout/Includes/ETNEW/ArtWin.asp?From=Archive&Source=Page&Skin=ETNEW&BaseHref=ETM"
set thedate to my thedatestring()
set thesuffix1 to "&ViewMode=HTML&GZ=T&PageLabel=1&EntityId=Ar0"
-- & pagenumber & articlenumber &
set thesuffix2 to "&AppName=1"
repeat with i from thestarting to maxarticles
set articlenumber to my createarticlenumber(i)
repeat with i from 1 to maxpages
set pagenumber to my createpagenumber(i)
set theURL to theprefix & thedate & thesuffix1 & pagenumber & articlenumber & thesuffix2
set filename to my createfilename(articlenumber, pagenumber)
set filepath to destfolderpath & filename
set qtdposixfilepath to quoted form of POSIX path of filepath
set command to "curl " & quoted form of theURL & " -o " & qtdposixfilepath
--set the end of thelist to theURL
try
do shell script command
on error e
log e
end try
end repeat
end repeat
--thelist
on thedatestring()
set command to "date \"+%Y/%m/%d/\""
set todaysdatestring to do shell script command
set AppleScript's text item delimiters to "/"
set theyear to first text item of todaysdatestring
set themonth to second text item of todaysdatestring
set theday to third text item of todaysdatestring
set theform to "%2F"
set thefinal to theform & theyear & theform & themonth & theform & theday & theform
return thefinal
end thedatestring
on createpagenumber(i)
return text -2 thru -1 of ("0" & i as text)
end createpagenumber
on createarticlenumber(i)
return text -2 thru -1 of ("0" & i as text)
end createarticlenumber
on createfilename(pagenum, articlenum)
set command to "date \"+%d/%m\""
set datestring to do shell script command
set filename to datestring & "-" & articlenum & "_" & pagenum & ".txt"
end createfilename
I do not know what is the format/encoding of the file though I have given it a “.txt” extension. As much as I know, curl downloads only the URL supplied to it and not the URLs contained in the supplied URL. This means, it does not download images. I do not want images either. But an icon shows up for the images when I open the file in TextEdit. I wrote an applescript to convert the text files to only text(i.e. to remove the image icon) by writing:
However, when I saved the file my original file: http://files.getdropbox.com/u/872430/01%3A08-01_01(original).txt got converted to this file: http://files.getdropbox.com/u/872430/01%3A08-01_01.txt
I really don’t what the new file is and why it is created–my guess is it is something related to encoding.
I would like to have curl download only text without even the icon for images and if that is not possible, I would like to atleast save the downloaded files as text only by running another applescript.
Thanks.