wildcard not working

Joy · November 8, 2014, 1:07pm

how to tell curl to download an arbitrary file ? i tried with the wildcard, but it doesnt work.

set url_of to "http://Webaddress/photo-of-the-day/*.jpg"

do shell script "curl -L " & quoted form of url_of & " > " & quoted form of POSIX path of ((path to desktop folder as text) & "Test.jpg" as text)

DJ_Bazzie_Wazzie · November 8, 2014, 1:37pm

Simply put: You can’t

URLs and file paths in bash are two complete different things. Bash has something called parameters expansion, you can use special characters to address multiple locations from your file system at once. The command still receives multiple arguments, it’s a feature in Bash. URLs are not supported by Bash and therefore they cannot be expanded.

Yvan_Koenig · November 8, 2014, 2:27pm

May you give a sample of a valid URL ?
I tried to open with Firefox the link “http://Webaddress/photo-of-the-day/1.jpg”
but got no result.

Yvan KOENIG (VALLAURIS, France) samedi 8 novembre 2014 15:27:01

ccstone · November 10, 2014, 5:42am

Hey Joy,

Curl does not support wildcards. It does support more direct substitution like so:


http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

wget on the other hand does support wildcards, although unlike curl it does not come pre-installed on OSX. You have to install it with MacPorts or HomeBrew or make it yourself.


wget -r -l1 -np -nd "http://members.aceweb.com/randsautos/photogallery/ferrari/enzo/" -A "*.jpg"

This is a working command, and it is safe - but please don’t abuse the host too much.

The recursive switch is potentially dangerous in that you can download far more stuff than you intended to.

Here is a little bit of info.

If you can you should post the actual site you’re working with, so we can do more than conjecture.

DJ_Bazzie_Wazzie · November 10, 2014, 12:55pm

More about wget:

What wget does is not actually using wildcards, which is what I think a misleading name (not for FTP connections). What it does is:

It downloads the given webpage (called a directory, but it’s not).
Get all links from that page
Filter the links matching with the regular expression and other options if any
Download all remaining links one by one.

For that you won’t need to install wget, you can make use of curl and awk (or sed) and get the same results. Below an quick example without need of wget but downloads the same files.

do shell script "cd " & quoted form of POSIX path of (choose folder) & "
directory_url=http://members.aceweb.com/randsautos/photogallery/ferrari/enzo/
curl $directory_url 2>/dev/null | tr [A-Z] [a-z] | AWK 'BEGIN{
RS=\"</a>\"
}
/href/{gsub(/.*href=\\042/,\"\",$0)
if ( $0 ~ \".*\\.jpg\"){
  gsub(/\\042.*/,\"\",$0)
  print $0
}
}' | while read filename
do
if [[ $filename == http://* ]]
then # the link is absolute
	curl -O $filename
else # the link is relative
	curl -O $directory_url$filename
fi
done"

Unless it’s your own server you’ll never know that if the returned page will actually contain links to all the files in the directory. Also most commercial web servers won’t return an indexed page of the directory at all, they return a 403 error or blank page. So the example above as the wget example both can be useless or inaccurate if the server returns nothing or only a small part of the directory. In practice both examples will be useless on most commercial web servers.

ccstone · November 10, 2014, 9:53pm

Well, that’s technically true, however the example used a wildcard rather than a regular expression for filtering.

In effect it’s a wildcard, and it’s far simpler to use than a script in the right context.

As always there’s more than one way to do things.


do shell script "
D=~/Desktop/Test_Enzo/;mkdir \"$D\";cd \"$D\";U=\"http://members.aceweb.com/randsautos/photogallery/ferrari/enzo/\";
curl -Ls \"$U\"|perl -wlne 'if(/<a href=\"(.+\\.jpg)\">/i){print \"url = '\"$U\"'$1\"}'|curl -Ls --remote-name-all -K -
"

DJ_Bazzie_Wazzie · November 11, 2014, 1:04am