Sieve of Erastothenes for Primes (Just for the fun of it)

Adam_Bell · November 22, 2005, 7:02pm

(* 
The Sieve of Erastosthenes

Historically, the most efficient way to find all of the small primes (say all those less than 10,000,000) is by using the Sieve of Eratosthenes (ca 240 BC). In an AppleScript, that number is more like a few thousand unless you are very patient.

 Make a list of all the odd integers less than or equal to a number, n, (and greater than two). Strike out the multiples of all primes less than or equal to the square root of n, then the numbers that are left are the primes.

For example, to find all the primes less than or equal to 100 we first list the odd numbers from 3 to 100 (why even list the evens?).  The first number is 3 so it is the first odd prime and we remove all of its multiples.  Now the first number left is 5, the second odd prime; remove its multiples.  Repeat with 7 and then since the first number left, 11, is larger than the square root of 100, all of the numbers left are primes. Put the 2 back and we're done.
*)

-- The Script

script PL
	-- storing the list we will build as a property of an enbedded script
	-- speeds up the list manipulations required considerably.
	property P_List : missing value -- will be the list of primes
end script

-- Reset the list so it won't retain earlier results. A property is only reset when a script is compiled unless explicitly reset.
set PL's P_List to {}

set howBig to display dialog "Please Enter the Upper Bound on the List" buttons {"Bail Out", "Enter"} default button "Enter" default answer "Please enter an integer here; it is not checked"
if button returned of howBig is "Bail Out" then return
set n to (text returned of howBig) as integer

-- Find the number of multiples to check
set theLimit to round (sqrt (n))

-- Fill the list to be pruned. This whole process would be faster if we eliminated early prime factors at this point, but we'll take the list of odd numbers from 3 to the number asked for.
fill_List(n)

-- Remove multiples of primes from the list
set Factor to 1
repeat
	set LL to length of PL's P_List
	if item Factor of PL's P_List > theLimit then exit repeat
	repeat with j from Factor + 1 to LL
		try
			if (item j of PL's P_List) mod (item Factor of PL's P_List) = 0 then
				set PL's P_List to remove(j, PL's P_List)
			end if
		on error -- when the factors removed make LL too long
			exit repeat
		end try
	end repeat
	set Factor to Factor + 1
end repeat

-- Output the result
set the beginning of PL's P_List to 2 -- 2 is prime, so put it back
set text item delimiters to ", "
set thePrimes to PL's P_List as text
set text item delimiters to ""
display dialog thePrimes buttons {"Copy to Clip", "OK"} default button "OK"
if button returned of result = "Copy to Clip" then set the clipboard to thePrimes

-- Handlers for filling the original list and removing the multiples

to fill_List(num)
	repeat with j from 3 to num by 2
		set end of PL's P_List to j
	end repeat
end fill_List

to remove(anyItem, anyList) -- this is a very useful handler for removing stuff from lists
	set fixedList to {}
        -- get the entries before the item to be removed
	if anyItem > 1 then
		set fixedList to items 1 thru (anyItem - 1) of anyList
	end if
        -- then get all the entries beyond the item to be removed
	if anyItem < length of anyList then
		set fixedList to fixedList & items (anyItem + 1) thru (length of anyList) of anyList
	end if
	return fixedList
end remove

Nigel_Garvey · November 24, 2005, 10:47pm

Hi, Adam.

That is fun. Thanks.

Another approach to removing the numbers from the list would be to zap them with ‘missing values’ and just return the list’s numbers at the end. (I’m not sure if the old extraction limit ” of about 4000 items at a time ” applies here or not, but it can be scripted round if you’re going for that many primes.)

This has the obvious advantage of not requiring the list to be rebuilt every time a number’s removed. But also, the fact that the list positions always correspond to the same numbers (zapped or not) until just before the end means that these can be used to advantage. For instance, say you’re dealing with the prime 5. Instead of testing all the following numbers to see if they’re divisible by 5 and zapping them if they are, you can simply zap every fifth subsequent item.

In this version, the 2 is put into the list by the fill_List() handler:

script PL
	-- storing the list we will build as a property of an enbedded script
	-- speeds up the list manipulations required considerably.
	property P_List : missing value -- will be the list of primes
end script

-- Reset the list so it won't retain earlier results. A property is only reset when a script is compiled unless explicitly reset.
set PL's P_List to {}

set howBig to display dialog "Please Enter the Upper Bound on the List" buttons {"Bail Out", "Enter"} default button "Enter" default answer "Please enter an integer here; it is not checked"
if button returned of howBig is "Bail Out" then return
set n to (text returned of howBig) as integer

-- Find the number of multiples to check
set theLimit to (sqrt (n)) div 1

-- Fill the list to be pruned. We can get its 'length', but we know that's (n + 1) div 2.
fill_List(n)
set list_length to (n + 1) div 2

-- Remove multiples of primes from the list
repeat with i from 2 to list_length
	set Factor to item i of PL's P_List
	if (Factor is missing value) then
		-- skip it
	else if (Factor > theLimit) then
		exit repeat
	else
		repeat with j from (i + Factor) to list_length by Factor
			set item j of PL's P_List to missing value
		end repeat
	end if
end repeat

return PL's P_List's numbers

-- Handlers

to fill_List(num)
	repeat with j from 1 to num by 2
		set end of PL's P_List to j
	end repeat
	set item 1 of PL's P_List to 2
end fill_List

Adam_Bell · November 24, 2005, 11:40pm

As always, Nigel, brisk and efficient.

I assume (without testing) that “set theLimit to (sqrt (n)) div 1” is faster than the line I had: “set theLimit to round (sqrt (n))” i.e. that “div 1” is faster than “round”. I probably thought that AppleScript would have rounded using “div 1”, but now see that round can have parameters [rounding up/down/toward zero/to nearest/as taught in school] so has to do more. Nice catch.

You suggest: “set list_length to (n + 1) div 2” instead of counting the list. It occurs to me that we’ve already run a count in the handler, i.e. this:

to fill_List(num)
   repeat with j from 1 to num by 2
       set end of PL's P_List to j
   end repeat
   set item 1 of PL's P_List to 2
end fill_List

and could have just passed back the length of the list when it was complete from the index of the repeat, i.e.

to fill_List(num)
   repeat with j from 1 to num by 2
       set end of PL's P_List to j
   end repeat
   set item 1 of PL's P_List to 2
   set num to j
   return num -- and then, of course fix the call to receive the returned num.
   -- Not a big issue in the grand scheme of things. We would "set list_Lenth to fill_List(n)"
end fill_List

ADDENDUM - passing back j in the fill_List handler didn’t work correctly on my first attempt - probably something dumb.

I very much like your “missing value” trick - it didn’t occur to me, and I would have thought that it would result in a list that had double commas in it bracketing the missing values. By all odds the best way to “remove(anElement, FromList)” that I’ve seen. Thanks for that trick.

Did you compare these for speed? If not, I will.

Nigel_Garvey · November 25, 2005, 10:21am

Hi, Adam. Yes, ‘div 1’ is faster, though not significantly so over just one instance. It’s a simple bit of vanilla maths, whereas ‘round’ is an OSAX call (StandardAdditions). When no rounding parameter’s specified, ‘round’ rounds ‘to nearest’ in the IEEE-specified manner: so it sometimes rounds up. In the script, that means that theLimit might be set slightly higher than the square root of n. If the square root isn’t a whole number, we only need to go to the nearest whole number below it, so we may as well use div 1, which always rounds towards zero.

You suggest: “set list_length to (n + 1) div 2” instead of counting the list. It occurs to me that we’ve already run a count in the handler […] and could have just passed back the length of the list when it was complete from the index of the repeat, i.e.
to fill_List(num)
	repeat with j from 1 to num by 2
		set end of PL's P_List to j
	end repeat
	set item 1 of PL's P_List to 2
	set num to j
	return num -- and then, of course fix the call to receive the returned num.
	-- Not a big issue in the grand scheme of things. We would "set list_Lenth to fill_List(n)"
end fill_List
ADDENDUM - passing back j in the fill_List handler didn’t work correctly on my first attempt - probably something dumb.

The final value of j is the last number in the list, ie. nearest odd number below or including the original value of num. We’ve omitted the intervening even numbers, so the number of items in the list is less than that.

A nice little paradox: ‘missing value’ is a value. You could in fact use any sort of value that isn’t a number for the idea to work, but ‘missing value’ is fast and almost self-explanatory. If you’re desperate to wring every last ounce of speed from the process, you could assign ‘missing value’ to a variable and use the variable in the source code for the repeat. That way, every eliminated position in the list would get set to the same instance of ‘missing value’, instead of a new ‘missing value’ being created each time. But, as I say, you have to be desperate.

Taking this idea one step further, all multiples of the prime that are also multiples of lower primes will already have been zapped, so there’s no need to begin zapping until the number “prime squared”. In our list, that’s in position ((Factor * Factor + 1) div 2):

script PL
	-- storing the list we will build as a property of an enbedded script
	-- speeds up the list manipulations required considerably.
	property P_List : missing value -- will be the list of primes
end script

-- Reset the list so it won't retain earlier results. A property is only reset when a script is compiled unless explicitly reset.
set PL's P_List to {}

set howBig to display dialog "Please Enter the Upper Bound on the List" buttons {"Bail Out", "Enter"} default button "Enter" default answer "Please enter an integer here; it is not checked"
if button returned of howBig is "Bail Out" then return
set n to (text returned of howBig) as integer

-- Find the number of multiples to check
set theLimit to (sqrt (n)) div 1

-- Fill the list to be pruned. We can get its 'length', but we know that's (n + 1) div 2.
fill_List(n)
set list_length to (n + 1) div 2

-- Assign 'missing value' to a variable for speed in the repeat.
set zapped to missing value

-- Remove multiples of primes from the list
repeat with i from 2 to list_length
	set Factor to item i of PL's P_List
	if (Factor is zapped) then
		-- skip it
	else if (Factor > theLimit) then
		exit repeat
	else
		-- Zap multiples of Factor, starting with Factor squared. (Lower multiples already done.)
		-- This repeat won't be excuted if ((Factor * Factor + 1) div 2) > list_length.
		repeat with j from ((Factor * Factor + 1) div 2) to list_length by Factor
			set item j of PL's P_List to zapped
		end repeat
	end if
end repeat

return PL's P_List's numbers

-- Handlers

to fill_List(num)
	repeat with j from 1 to num by 2
		set end of PL's P_List to j
	end repeat
	set item 1 of PL's P_List to 2
end fill_List

Adam_Bell · November 25, 2005, 1:41pm

This is amazingly fast, Nigel. Even on an ooold G3/(augmented to 1100), it will do 100,000 primes in about 4 seconds. I’ve got a dual-core G5/2.3 on order, and I’ll see how long it takes to do a million. Thanks for the tricks!

One question: would it be any quicker if “zapped” were one of the opening script properties?

Nigel_Garvey · November 26, 2005, 1:04am

I think you’ll be pleasantly surprised ” and not only with this script! If by “100,000 primes” you mean “all the primes between 2 and 100,000”, then “a million” takes 2’ 22" on a mere G5 DP 2.0. .

I’ve been trying to think of a way further to reduce the number of unnecessary zaps in the inner repeat, but I haven’t found it yet. It’s occurred to me that since the position of any number, including the last one before theLimit is exceeded, can easily be calculated, the outer repeat can be tidied up a little:

-- Remove multiples of primes from the list
repeat with i from 2 to (theLimit div 2)
	set Factor to item i of PL's P_List
	if (Factor is zapped) then
		-- skip it
	else
		-- Zap multiples of Factor, starting with Factor squared. (Lower multiples already done.)
		-- This repeat won't be excuted if ((Factor * Factor + 1) div 2) > list_length.
		repeat with j from ((Factor * Factor + 1) div 2) to list_length by Factor
			set item j of PL's P_List to zapped
		end repeat
	end if
end repeat

But this has little effect on the performance, as the outer repeat does relatively few iterations. Even with 1,000,000 numbers, theLimit is only 1000 and so the outer repeat loops from 2 to 500 ” ie. 499 times. It’s the inner repeat that does most of the work.

I don’t think so. The script object trick, which exploits a quirk in the way AppleScript handles lists, is a way of providing a “reference” to a variable holding a list. Using this reference (“PL’s P_List”) instead of just the variable name (“P_List”) speeds up access to the items in the list. It doesn’t speed up actions with the list itself and probably wouldn’t help with a ‘missing value’ either.

In fact, since PL is a top-level script object here, it’s not really necessary. P_List could instead be a property of the main script and could be referred to as ‘my P_List’. A script object would normally be used for this purpose inside a handler, because local variables can’t be referenced. It’s a local way of providing properties that can be referenced.

Adam_Bell · December 6, 2005, 4:12pm

Just for the fun of it, I’ve been playing with this further - as a way to determine whether the number entered is prime. Obviously, the brute force method is to run Nigel’s sieve code up to the square root of the number given and see if the number is in PL’s P_List. Alternatively, one can store the list up to any number as a script property called SmallFactors, say, and then run something like the script below going on to the sieve code starting at it’s highest value only when SmallFactors does not contain a factor (because SmallFactors is not large enough):

repeat with aFactor in PL's SmallFactors
	if n mod aFactor = 0 then
		return "Not Prime; is divisible by " & aFactor
		exit repeat
	end if
end repeat

The question that then arises is what’s the tradeoff - how big should the list be - when will simply running the sieve be faster than storing a huge list of primes and checking its elements one at a time for divisibility? After all, if the number is prime, then you will have checked the entire SmallFactor list (if the square root of n is bigger than the top of the list) and then run the sieve.

Nigel_Garvey · December 7, 2005, 1:37am

I think that finding all the primes in a range and deciding whether or not a particular, isolated number is a prime are slightly different problems. I got caught up in a three-way e-mail conversation about this back in 2002 with Arthur Knapp and Michael Sullivan. Arthur was trying to devise the fastest isPrime() handler ever, and he roped in Michael for his knowledge of maths and something called “wheel factorisation”, and me for optimisation assistance. I’ve just been searching through my Emailer database for that year. It’s been difficult to reconstruct the conversation from the scattered parts, but I think the most sensible attempt up with which we came was the one below. The initial 'if . then . else’s eliminate all multiples of the primes up to 31 as quickly as possible, whereafter (if necessary) a repeat uses “wheel factorisation” to test what’s left. I’m not mathematically educated enough to be able to judge how high it can go and still be effective, but Michael seemed to think it was OK:

on IsPrime(n)
	(*
	 *  Wheel Factorization, MES and NG.
	 *)
	try
		if (n is not equal to n div 1) then
			return false
		end if
	on error
		error ("IsPrime( n )" & return & return & ¬
			"Parameter is not a number.")
	end try
	
	if (n < 37) then
		return (n is in {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31})
	else if (n mod 2) = 0 then
		return false
	else if (n mod 3) * (n mod 5) = 0 then
		return false
	else if (n mod 7) * (n mod 11) * (n mod 13) * ¬
		(n mod 17) * (n mod 19) * (n mod 23) * ¬
		(n mod 29) = 0 then
		return false
	else
		repeat with i from 30 to ((n ^ 0.5) div 1) by 30
			if ((n mod (i + 1)) * ¬
				(n mod (i + 7)) * ¬
				(n mod (i + 11)) * ¬
				(n mod (i + 13)) * ¬
				(n mod (i + 17)) * ¬
				(n mod (i + 19)) * ¬
				(n mod (i + 23)) * ¬
				(n mod (i + 29)) = 0) then return false
		end repeat
		return true
	end if
end IsPrime

Nigel_Garvey · December 8, 2005, 10:44am

OK. The Knapp/Sullivan/Garvey IsPrime() (see previous post) requires the primes up to 29 to be known already. Here, more in keeping with the spirit of Adam’s idea, is a hybrid that uses Erastothenes’s sieve to generate those primes ” except that I’ve still allowed myself the luxury of knowing the first three in advance!

I don’t know the official theory behind “wheel factorisation”, but analysis suggests that that value of the first “spoke” of the wheel has to be the lowest common multiple of the first [any number of] primes. If the values of all the other primes up to this multiple are known too, they can be added to it to give all the primes between it and the next spoke. However, adding the primes that are factors of the spoke just gives more multiples of those primes, so they can be omitted for efficiency.

The only way I can think of to generate the right number of small primes from “first principles” is to use Erastothenes’s sieve to generate a small list of primes, multiple the first so-many of them together, and then run the sieve again with the result as the target. This goes against everything I stand for in scripting, which is why I’ve started out with the foreknowledge of 2, 3, and 5.

on IsPrime(n)
	(*
   *  Wheel Factorization, MES and NG.
   *)
	try
		if (n is not equal to n div 1) then return false
	on error
		error ("IsPrime( n )" & return & return & ¬
			"Parameter is not a number.")
	end try
	
	-- Set an increment (the LCM of the first three primes) for the wheel-factorisation spoke interval.
	-- It's assumed here that we're allowed to know the first three primes (2, 3, and 5) in advance!
	set spokeIncrement to 2 * 3 * 5
	-- Get a list of the enclosed "small primes".
	set smallPrimes to sieve(spokeIncrement)
	
	if (n is less than or equal to spokeIncrement) then
		return (n is in smallPrimes)
	else if (isSmallPrimeMultiple(n, smallPrimes)) then
		return false
	else
		-- Resort to wheel-factorisation.
		-- The first three primes (the factors of the "spoke" increment) are now redundant.
		set smallPrimes to items 4 thru -1 of smallPrimes
		-- But we need a 1 instead.
		set beginning of smallPrimes to 1
		
		script o
			property sp : smallPrimes
		end script
		
		-- Using "spoke" values up to the square root of the number, test if the number's a multiple
		-- of (the spoke + any of the small primes). If it is, it's not a prime itself. Return 'false'.
		set smallPrimeCount to (count smallPrimes)
		repeat with spoke from spokeIncrement to ((n ^ 0.5) div 1) by spokeIncrement
			repeat with i from 1 to smallPrimeCount
				if (n mod (spoke + (item i of o's sp)) is 0) then return false
			end repeat
		end repeat
		
		return true -- The number's a prime.
	end if
end IsPrime

-- Test if a number is a multiple of any numbers in the given list.
on isSmallPrimeMultiple(n, smallPrimes)
	script o
		property sp : smallPrimes
	end script
	
	repeat with i from 1 to (count smallPrimes)
		-- Return 'true' if the number's a multiple of this prime.
		if (n mod (item i of o's sp) is 0) then return true
	end repeat
	
	return false -- The number's not a multiple of anything in the list.
end isSmallPrimeMultiple

-- Sieve of Erastothenes, by Adam Bell and Nigel Garvey.
-- Get a list of all the primes between 2 and a given value.
on sieve(maxValue)
	script PL
		property P_List : {} -- will be the list of primes
	end script
	
	-- Fill the list to be pruned with 2 and the odd numbers from 3 to the value.
	fill_List(PL, maxValue)
	set list_length to (maxValue + 1) div 2
	
	-- Assign 'missing value' to a variable for speed in the repeat.
	set zapped to missing value
	-- Remove multiples of primes from the list
	repeat with i from 2 to (maxValue div 2)
		set Factor to item i of PL's P_List
		if (Factor is zapped) then
			-- skip it
		else
			-- Zap all multiples of Factor, starting with Factor squared. (Lower multiples already done.)
			repeat with j from ((Factor * Factor + 1) div 2) to list_length by Factor
				set item j of PL's P_List to zapped
			end repeat
		end if
	end repeat
	
	-- Return the list of small primes.
	return PL's P_List's numbers
end sieve

-- Fill a list, contained in a given script object, with 2 and the odd numbers from 3 up to a given value.
to fill_List(PL, max)
	set PL's P_List to {2}
	repeat with j from 3 to max by 2
		set end of PL's P_List to j
	end repeat
end fill_List

IsPrime(8796049) --> true

hayne · October 28, 2006, 1:10am

I came upon this thread when googling for an implementation of the “Sieve” in AppleScript.
One thing I immediately noticed is that the script supplied above uses the ‘sqrt’ function which is apparently not a standard part of AppleScript but is (I have read) part of an osax supplied by Satimage.

I modified the Sieve script to avoid the use of ‘sqrt’ by instead comparing the square of Factor with n.

Here’s the modified script:

-- This script is based on a script from: http://bbs.applescript.net/viewtopic.php?pid=49600
-- I modified it to avoid the need for 'sqrt' by comparing FactorSq with n
-- Cameron Hayne (macdev@hayne.net)  Oct 2006

script PL
	-- storing the list we will build as a property of an enbedded script
	-- speeds up the list manipulations required considerably.
	property P_List : missing value -- will be the list of primes
end script

-- Reset the list so it won't retain earlier results. A property is only reset when a script is compiled unless explicitly reset.
set PL's P_List to {}


set howBig to display dialog "Please Enter the Upper Bound on the List" buttons {"Bail Out", "Enter"} default button "Enter" default answer "Please enter an integer here; it is not checked"

if button returned of howBig is "Bail Out" then return
set n to (text returned of howBig) as integer

-- Fill the list to be pruned. We can get its 'length', but we know that's (n + 1) div 2.
fill_List(n)
set list_length to (n + 1) div 2

-- Assign 'missing value' to a variable for speed in the repeat.
set zapped to missing value

-- Remove multiples of primes from the list
repeat with i from 2 to list_length
	set Factor to item i of PL's P_List
	if (Factor is not zapped) then
		set FactorSq to (Factor * Factor)
		if (FactorSq > n) then
			exit repeat
		else
			-- Zap multiples of Factor, starting with Factor squared. (Lower multiples already done.)
			-- This repeat won't be excuted if ((FactorSq+ 1) div 2) > list_length.
			repeat with j from ((FactorSq + 1) div 2) to list_length by Factor
				set item j of PL's P_List to zapped
			end repeat
		end if
	end if
end repeat

return PL's P_List's numbers

-- Handlers

to fill_List(num)
	repeat with j from 1 to num by 2
		set end of PL's P_List to j
	end repeat
	set item 1 of PL's P_List to 2
end fill_List

Nigel_Garvey · October 28, 2006, 12:49pm

Agh! Well spotted, Cameron. We should have noticed that during the original discussion. I’m only glad I happened to use the vanilla equivalent (n ^ 0.5) in my last isPrime() post. :rolleyes:

Your alternative is very nice and appears to have about the same speed as the script on which it’s based.