I referred to some scripts in this forum which involved downloading a newspaper and I wrote one for myself.
tell me to activate set thedisplay to display dialog "Enter the articles range" default answer "" --dialog box will collect the article number to start with for each page and maximum articles to be downloaded for each page --for example, specifying "1,8" in the dialog box will download first 8 articles on each of the 24 pages set theresult to the text returned of thedisplay set AppleScript's text item delimiters to "," set thestarting to the first text item of theresult set maxarticles to the second text item of theresult property maxpages : 24 property destfolderpath : "5:ET:" set theprefix to "http://epaper.timesofindia.com/Default/Layout/Includes/ETNEW/ArtWin.asp?From=Archive&Source=Page&Skin=ETNEW&BaseHref=ETM" set thedate to my thedatestring() set thesuffix1 to "&ViewMode=HTML&GZ=T&PageLabel=1&EntityId=Ar0" -- & pagenumber & articlenumber & set thesuffix2 to "&AppName=1" repeat with i from thestarting to maxarticles set articlenumber to my createarticlenumber(i) repeat with i from 1 to maxpages set pagenumber to my createpagenumber(i) set theURL to theprefix & thedate & thesuffix1 & pagenumber & articlenumber & thesuffix2 set filename to my createfilename(articlenumber, pagenumber) set filepath to destfolderpath & filename set qtdposixfilepath to quoted form of POSIX path of filepath set command to "curl " & quoted form of theURL & " -o " & qtdposixfilepath --set the end of thelist to theURL try do shell script command on error e log e end try end repeat end repeat --thelist on thedatestring() set command to "date \"+%Y/%m/%d/\"" set todaysdatestring to do shell script command set AppleScript's text item delimiters to "/" set theyear to first text item of todaysdatestring set themonth to second text item of todaysdatestring set theday to third text item of todaysdatestring set theform to "%2F" set thefinal to theform & theyear & theform & themonth & theform & theday & theform return thefinal end thedatestring on createpagenumber(i) return text -2 thru -1 of ("0" & i as text) end createpagenumber on createarticlenumber(i) return text -2 thru -1 of ("0" & i as text) end createarticlenumber on createfilename(pagenum, articlenum) set command to "date \"+%d/%m\"" set datestring to do shell script command set filename to datestring & "-" & articlenum & "_" & pagenum & ".txt" end createfilename
I do not know what is the format/encoding of the file though I have given it a “.txt” extension. As much as I know, curl downloads only the URL supplied to it and not the URLs contained in the supplied URL. This means, it does not download images. I do not want images either. But an icon shows up for the images when I open the file in TextEdit. I wrote an applescript to convert the text files to only text(i.e. to remove the image icon) by writing:
However, when I saved the file my original file: http://files.getdropbox.com/u/872430/01%3A08-01_01(original).txt got converted to this file: http://files.getdropbox.com/u/872430/01%3A08-01_01.txt
I really don’t what the new file is and why it is created–my guess is it is something related to encoding.
I would like to have curl download only text without even the icon for images and if that is not possible, I would like to atleast save the downloaded files as text only by running another applescript.