Creating sets of valid CAS registry numbers

If you are also working in an industry that uses a lot of chemicals, than chances are good that you already know CAS registry numbers which provide unique numerical identifiers for chemical compounds, polymers, biological sequences, mixtures and alloys.

The AppleScript code at the end of this post provides a function named «createcasnos» which allows to create a range of valid CAS registry numbers between a start and end number:


set casnos to my createcasnos(101, 102)
return casnos
-- {"101-00-8", "101-01-9", "101-02-0", ... , "102-97-6", "102-98-7", "102-99-8"}

This can come in handy when you need to create folder names for chemical products or if you need to validate existing CAS registry numbers. For example, I am using the «createcasnos» function in an AppleScript that helps me to maintain our patent database for electroplating additives.


my testfunc()

-- I am a test function
on testfunc()
	tell me
		display dialog "This test function will first ask you to choose or create an empty folder and then create subfolders therein which are named after a certain set of CAS registry numbers (start number: 101, end number: 105)."
	end tell
	set folderpath to POSIX path of ((choose folder with prompt "Please choose or create an empty folder:") as Unicode text)
	set casnos to my createcasnos(101, 105)
	set subfolderpaths to ""
	repeat with casno in casnos
		set subfolderpath to quoted form of (folderpath & casno)
		set subfolderpaths to subfolderpaths & space & subfolderpath
	end repeat
	set command to "mkdir -p" & subfolderpaths
	set command to command as «class utf8»
	do shell script command
end testfunc

-- I am generating a list of all valid CAS registry numbers between
-- a given start and end number
-- WARNING: Creating a large amount of CAS registry numbers can take quite a while!
on createcasnos(startno, endno)
	-- validating the given start and end number
	set errmsg to missing value
	if startno < 0 then
		set errmsg to "The start number must be equal to or greather than 0."
	else if startno > 999999 then
		set errmsg to "The start number must be smaller than 1000000."
	end if
	if endno < 0 then
		set errmsg to "The end number must be equal to or greather than 0."
	else if startno > 999999 then
		set errmsg to "The end number must be smaller than 1000000."
	end if
	if startno > endno then
		set errmsg to "The start number must be smaller than the end number."
	end if
	if errmsg is not missing value then
		error errmsg
	end if
	-- a CAS registry number looks like this: 7732-18-5 (CAS# for water) 
	-- the first part of a CAS registry number can have upto 6 digits
	set casnos to {}
	repeat with i from startno to endno
		set casnopartone to (i as Unicode text)
		-- the second part of a CAS registry number always has 2 digits
		repeat with i from 0 to 99
			-- adding a preceding zero if necessary -> '1' = '01' 
			if i < 10 then
				set casnoparttwo to ("0" & i) as Unicode text
			else
				set casnoparttwo to (i as Unicode text)
			end if
			-- the third part of a CAS registry number is a calculated checksum 
			-- the checksum is calculated by taking the last digit times 1, 
			-- the next digit times 2, the next digit times 3 etc., 
			-- adding all these up and computing the sum modulo 10 
			set strnumbers to casnopartone & casnoparttwo
			-- flip it! 
			set strnumbers to (reverse of (characters of strnumbers)) as Unicode text
			set counter to 0
			set checksum to 0
			repeat with strnumber in strnumbers
				set counter to counter + 1
				set checksum to checksum + (counter * (strnumber as integer))
			end repeat
			set casnopartthree to checksum mod 10
			-- putting it all together to generate a valid CAS registry number 
			set casno to (casnopartone & "-" & casnoparttwo & "-" & casnopartthree)
			set casnos to casnos & casno
		end repeat
	end repeat
	return casnos
end createcasnos

I also once wrote a similar function in Python, which you can find here.