Extract a number with 5 digits

Hi,

I am really new to this so apologies if this is a really easy thing to do!

I am trying to extract a number with 5 digits from a line of text eg:

500ml 12345 metal

I can get it to extract the numbers (500 & 12345) but I only want 12345 (5 digits).

Any help is much appreciated.
Thanks.

This is how I extracted the numbers:


set test to "500ml 12345 metal"
set WordCount to count words of test
do shell script "echo " & quoted form of test & ¬
	" | /usr/bin/grep --only-matching '[0-9]\\+'"
set example to result
end
example

How about this version?


set test to "500ml 12345 metal"
set WordCount to count words of test
try
	do shell script "echo " & quoted form of test & ¬
		" | /usr/bin/grep -Eo '[[:<:]][0-9]{5}[[:>:]]'"
on error
	""
end try
set example to result

example

Brilliant! Thanks Nigel.
Do not understand that bit of code one bit - but hopefully one day!

:slight_smile:

The regex [[:<:]][0-9]{5}[[:>:]] means five consecutive digits starting at the beginning of a “word” and ending at the end of a “word” ” that is, five consecutive digits with only white space, punctuation, or nothing immediately to their left and only that to their right. If you need to exclude five-digit numbers adjacent to punctuation, the regex will need modifying.

The grep options are -E (Extended regex), which configures grep to understand the kind of regex used, and -o, which means the same as your –only-matching.

Maybe for other users: The character class name in Nigel’s example isn’t supported in all versions of Mac OS X. It’s better to use the standard word boundaries \b which seems to work through all versions of Mac OS X.

set test to "500ml 12345 metal djwioq12345dudi"
set WordCount to count words of test

set example to do shell script "/usr/bin/grep -Eo '\\b[0-9]{5}\\b' <<< " & quoted form of test & "|| echo"

I’ve added " || echo" to the end of the command which is an or (||) operator. This means when the command on the left of the operator fails (does not return 0) then the command on right is executed. This surpress the error of grep when no match is found, similar to your try error block.

Thanks, DJ. I hadn’t realised that grep recognises \b. (sed doesn’t.) Thanks for the shell OR construction too. :slight_smile:

Hello.

 || true

does the same thing, and is even more efficient than the echo command, as it only exits with a value of zero.

May you explain what is filtering the results so that 12345 is returned only once when it’s available twice in DJ Bazzie Wazzie example ?

KOENIG Yvan (VALLAURIS, France) jeudi 15 août 2013 10:26:56

Hi Yvan.

The \bs in DJ’s version and the [[:<:]] and [[:>:]] in mine specify the edges of words. So the regex as a whole means that the five digits have to be a complete “word” in themselves. The second “12345” in DJ’s example is part of the “word” “djwioq12345dudi” and so isn’t returned.

And the shell script still returns an empty text. Nice. :slight_smile:

Efficiency is not an issue here, both true and echo (without specifing a path) are bash functions which means they both don’t spawn a process but simply returns directly a value from some of the functions inside bash.

edit: But I agree that true function makes more sense in meaning of the word itself than echo, still both function returns a newline (read: not empty text). True is not only returning 0 but printing to stdout like echo.

edit2: True function seems to lookup how bash is started. When interactive mode is turned on true will print a newline, when interactive mode in bash is turned off it won’t print a newline. Lot of processing for a simple true command :stuck_out_tongue:

Thanks Nigel.
I didn’t took care to that.

In fact I was sticked to a sentence of the second message :

I can get it to extract the numbers (500 & 12345) but I only want 12345 (5 digits).

which let me think that the problem with the first embedded code was not the fact that it returned 500 because it was glued to ml but only because it wasn’t five digits long.

Would be fine to get todmeister’s advice.

I was wondering too if the line :
set WordCount to count words of test
has a hidden signification.
My understanding is that it’s just wasted characters.

KOENIG Yvan (VALLAURIS, France) jeudi 15 août 2013 15:45:52