Standard Deviation, The Mean, accommodating fluctuating values

Hello there,

Rather than just focussing on short monophonic samples, I’ve decided to incorporate tonal movement into my project, so in addition to frequency & confidence readings, for this section of code I have needed to include some standard deviation calculations.

In consideration of the western scale, it would be sensible to set the conversion from Hertz to the tempered 12 note scale at 50cents (1/4 semitone) each side, I intend to add microtonna features but for this task it makes sense to set the SD at half tones. So once the signal deviates to the value of the frequency interval ratio I would like to store the mean from the current readings as 1 cell, value, then move on to the next batch of readings which will be collected until the pitch once again deviates beyond (above or below) the frequency interval ratios. But yeah I’m stuck on syntax and wanted to ask if anyone might be able to offer some help.

So, I would like to store the mean value, calculated from the first part of the stddev handler and store it as one reading, then move on to the next value until the pitch deviates past the limit then store that average and so on. Most of the tonal anomalies and unhelpful harmonic content has been removed by introducing a confidence threshold. So I should, for the best part be able to rely on the information thats there. Or that will hopefully be there if I can sucessfully procure the separate average readings.

I hope that makes some some and thank you in advance for any advice!

Heres what I have, minus the confidence value shell task/AS code, and theres no CSV so I just added some numbers.

Thanks again

Doug


use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use framework "AVFAudio"
use scripting additions
property freqData : ""

-- reading the CSV
set my_data to read POSIX path of "Mojave:Users:dh:CSV:example.f0.csv" as «class furl»
set csvData to paragraphs 2 thru -2 of my_data as list -- theres a text row i want to be ignored

set freqData to {}

set oldDelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to ","

repeat with a_reading in csvData
	
	set x to text item 1 of a_reading -- timestamp from CSV
	set y to text item 2 of a_reading -- frequency readings from CSV
	set z to text item 3 of a_reading -- confidence readings from CSV
	set end of freqData to y --frequency readings  stored
	
end repeat



on stddev(readingHz) -- standard deviation handler
	set n to count of readingHz
	
	set sum to 0
	repeat with val in readingHz       -- calculating the mean
		set sum to sum + val
	end repeat
	
	set mean to sum / n
	
	set sum to 0
	
	repeat with val in readingHz
		set sum to sum + (val - mean) ^ 2
	end repeat
	
	return (sum / (n - 1)) ^ 0.5
end stddev

stddev({61.05, 67.23, 71.23, 70, 65.84, 60.49}) -- pretend this is the freqData data
--> 4.466536316506
set sD to the result

Model: 2019 2.9 or something xeon core 16gb
AppleScript: Script Debugger 8
Browser: Safari 605.1.15
Operating System: macOS 10.15

Hi. I’m not certain I understand the question, as posed. You indicated you skipped a row intentionally, but did you intend to skip two? “2 thru -2” ignores the first and last lines. Assume nobody here knows anything about tonal movements, scales, monophonics, etc.—which is at least true for myself. What constitutes “the limit?” Is it based on some difference inside the value set in a given list or between lists? Because I happen to have already created some code for measuring stats and have it handy, I’ll leave a modified sample here to see if this may be your goal; hopefully, it won’t further confuse the issue.

set iterations to {}
set pretendCSV to {{61.05, 67.23, 71.23, 70, 65.84, 60.49}, {1, 2, 3, 4, 5, 6}}

repeat with freqData in pretendCSV
	set countGivenList to count freqData
	set sorted to my sort(freqData)
	
	#Mean
	set endvalue to 0
	repeat with x in freqData
		set endvalue to endvalue + x
	end repeat
	set mean to endvalue / countGivenList
	
	#Variance
	--differentiate
	set diffList to {}
	repeat with x in freqData
		set diffList's end to x - mean
	end repeat
	--square
	set squareList to {}
	repeat with x in diffList
		set squareList's end to roundRATIS((x ^ 2), 3)
	end repeat
	--sum squares
	set sum to 0
	repeat with x in squareList
		set sum to sum + x
	end repeat
	set variance to (sum / (countGivenList - 1))
	
	#Standard Deviation
	set SD to variance ^ 0.5
	
	set iterations's end to return & "MEAN: " & roundRATIS(mean, 3) & return & "VARIANCE: " & variance & return & "Standard Deviation: " & roundRATIS(SD, 3) & return & "RANGE: " & ((sorted's item -1) - (sorted's item 1)) & return & "---------------------------------------"
end repeat
iterations

# subroutines------------------------------------------------------------------------------

on sort(thelist)
	set AppleScript's text item delimiters to linefeed
	set new_string to do shell script "echo " & (thelist as text)'s quoted form & " | sort -g"
	set sorted to {}
	repeat with anItem in new_string's paragraphs
		set sorted's end to anItem as number
	end repeat
	set AppleScript's text item delimiters to ""
	sorted
end sort

on roundRATIS(num, decimalPlaces) --rounding as taught in school
	if num < 0 then return -(roundRATIS(num - (num * 2), decimalPlaces))
	set raiseTens to 10 ^ decimalPlaces
	(((num * raiseTens) + 0.5) div 1) / raiseTens
end roundRATIS

Hey Marc, 1/12
Thanks fr your reply. A semitone corresponds to multiplying a number of Hz by 2 . I’d want 1/2 of a semitone each side of the mean.
Yeah I’m running some thing like this at the beginning:


set theCSV to (do shell script "awk -F ','  '$3>=0.6 ' "/Users/dh/CrepeCSV/example.f0.csv" 

Which will filter out any row with a confidence reading under 0.6 The confidence readings are readings between 0.0-1.0 that show the confidence in the pitch’s estimation accuracy. It usually whittles the results down by about about 2/3rds, eliminating non useful harmonics and other sounds that just make the reading less accurate.

That reduces the amount of rows but the columns stay at 3. This bit of code filters it down the columns to 1 line. The confidence readings and time stamps by this point have served their purpose


set freqData to {}

set oldDelims to AppleScript's text item delimiters
set AppleScript's text item delimiters to ","

repeat with a_reading in csvData
	
	set x to text item 1 of a_reading -- timestamp from CSV
	set y to text item 2 of a_reading -- frequency readings from CSV
	set z to text item 3 of a_reading -- confidence readings from CSV
	set end of freqData to y

– just storing frequenncy readings in y

This leaves me with one continuous list that looks more like the one here. With 1 set of curly braces and a comma and space after every value.


set pretendCSV to {61.05, 67.23, 71.23, 70, 65.84, 60.49, 1, 2, 3, 4, 5, 6, 4, 7, 2, 7, 4, 9, 4, 2, 1, 32, 34, 35, 34, 32, 32, 39, 40, 54, 56, 55.5, 59.6, 53, 54.2, 57, 55, 55, 56, 55, 57, 123, 121, 124, 123, 126, 119, 120, 122, 122, 123}

Which i would like to end up looking like this:

{65.05, 4.75, 38.5, 55, 123}
Doug

Hi. Three items in your final goal aren’t in the pretend parent list but two are. How did you arrive at that result?

Its just an example. The information in the CSV represents estimated frequency readings over time with an accuracy reading.

Between the time stamp, frequency and confidence readings the analysis tool will render 3 readings every 10msec, around 3,000 values x3 in an average set. so I will want to remove rows under a level of certainty, get rid of the timestamp and confidence readings and then take the average of each set of readings within an SD range of = 12*(log (fn/440) / log(2)) and store mean results taken throughout the course of their range as single readings with every frequency reading with a confidence in accuracy higher than .6

So yeah, when

StanDev exceeds the result of (2^(n/12)*mean-mean) - (n) being 1 (100c/semitone) the mean of that list of frequency values is stored as 1 reading in a list and the task repeats until end.

Im not sure where in the script i would add the request to store the averages, wether i’d do it inside or outside the handler

Your script has helped me understand a few things.

If I had a list of lists your script would work for what i needed to do, but i just have one long list. Is there a way to sort the list into an Lol, each new list starting when SD is equal to or greater than (2^(1/12)*mean-mean) ?

It sounds like you’re asking to have the list itself change, and I don’t see that as being a way to provide a reliable baseline for your test, if my understanding is correct. You have a set with frequencies and a formula to convert them to another unit, which I understand will serve as the upper range for each item. I’m assuming a lower range. Does this generally do what you’re expecting?

set pretendCSV to {61.05, 67.23, 71.23, 70, 65.84, 60.49, 1, 2, 3, 4, 5, 6, 4, 7, 2, 7, 4, 9, 4, 2, 1, 32, 34, 35, 34, 32, 32, 39, 40, 54, 56, 55.5, 59.6, 53, 54.2, 57, 55, 55, 56, 55, 57, 123, 121, 124, 123, 126, 119, 120, 122, 122, 123}

--arrive at mean, s via formula; below are static values for the example set
set mean to 51.473
set SD to 41.928

set filtered to {}

repeat with Hertz in pretendCSV —are these Hertz frequencies?
	set semitone to Hertz * (2 ^ (1 / 12))
	set tolerance to semitone / 2
	tell Hertz to if it is less than (SD - tolerance) or it is greater than (SD + tolerance) then
		--possibly do some out of range action?
	else
		set filtered's end to contents
	end if
end repeat

filtered

Thank you so much for your help Marc. As the results all have a high confidence I found I could simply loop a round command and eliminate duplicates. The purpose of this script is to analyse an audio file and decipher what notes populate the sample. I know there are 2 round commands in there, im going to remove the first one. I need to work the (2 ^ (1 / 12)) ratio to the rounding process, or rather going back the way its a log2 func formula ¢ or c = 1200 × log2 (f2 / f1)
log 2 = 0.301029995. Is it possible to call log2 or any log function in AS? the conversion can be expressed in log 10 too

I might also look into compiling a library for all the note to frequency stuff
C0 16.35
C#0/Db0 17.32
D0 18.35
D#0/Eb0 19.45
E0 20.60
F0 21.83
F#0/Gb0 23.12
G0 24.50
G#0/Ab0 25.96
A0 27.50
A#0/Bb0 29.14
B0 30.87
C1 32.70
C#1/Db1 34.65
D1 36.71
D#1/Eb1 38.89
E1 41.20


use AppleScript version "2.4"
use framework "Foundation"
use framework "AVFAudio"
use scripting additions

-- property parent : class "NSObject"
property ca : current application
property theFormat : "WAV"

tell application "Finder"
	
	set volume alert volume 0
	delete (every item of folder "Mojave:Users:dh:CSV:sample:" whose name extension is "wav")
	delete (every item of folder "Mojave:Users:dh:CSV:sample:" whose name extension is "csv")
	delete (every item of folder "Mojave:Users:dh:CSV:sample:" whose name contains "example")
	delete (every item of folder "Mojave:Users:dh:Desktop:suf:")
	delete (every item of folder "Mojave:Users:dh:Desktop:pre:")
	delete (every item of folder "Mojave:Users:dh:Desktop:./recording:")
	do shell script "cd /Users/dh/Desktop/.:resources/; > output.txt"
	delay 0.5
	set fileName to (choose file with prompt "Please Select a WAV file for Processing")
	set audioWav to fileName
	tell application "Finder"
		set audio_wav to duplicate audioWav to "Mojave:Users:dh:CSV:sample:"
	end tell
	
	tell application "Finder"
		set f to (the first file of ¬
			container (alias "Mojave:Users:dh:CSV:sample:") of ¬
			application "Finder" whose name ends with "wav")
		set ti to text items of (get name of f)
		if number of ti is 1 then
			set name of f to "example.wav"
		else
			set name of f to "example" & "." & "wav"
		end if
	end tell
	
		
	set copyPerc to (path to resource "Oerc.scpt" in directory "Mojave:Users:dh:CSV:sample:")
run script copyPerc
		
	
	
	
	tell application "Finder"
		set recCsv to "/Users/dh/CSV/sample/example.f0.csv"
	end tell
	
	
	set my_data to read recCsv
	set my_data to do shell script "awk -F ','  '$3>=0.75' " & (recCsv)'s POSIX path's quoted form -- remove rows with confidence readings lower than 7.5
	-- set my_data to do shell script "awk -F ','  '$1<=25.111' " & (recCsv)'s POSIX path's quoted form -- remove column 1
	-- set my_data to do shell script "awk '{ $2 = $2 * 2 }' " & (recCsv)'s POSIX path's quoted form
	
	set csvData to paragraphs 2 thru -2 of my_data as list
	
	set freqData to {}
	
	set oldDelims to AppleScript's text item delimiters
	set AppleScript's text item delimiters to ","
	
	repeat with a_reading in csvData
		
		--		try
		set x to text item 1 of a_reading -- timestamp from CSV
		--	set y to ((text item 2 of a_reading) * 2)               -- for some reasonn couldnt do y*2
		set y to ((round ((text item 2 of a_reading) / 5)) * 5) -- frequency readings from CSV
		set z to text item 3 of a_reading -- confidence readings from CSV
		set end of freqData to y
		--		end try
	end repeat
	set fullList to freqData
	set hzList to {}
	--
	repeat with i from 1 to count of items of fullList -- removing duplicate readings
		if item i of fullList is not in hzList then
			set hzList to hzList & item i of fullList
		end if
	end repeat
	
script
		on |λ|(x)
			x * 2
		end |λ|
	end script
	map(result, hzList)
	
	set hz_List to the result
	
	set frequencies to {}
	repeat with frequency in hz_List
		set end of frequencies to ¬
			{((round ((frequency / 256) / 5)) * 5), ((round ((frequency / 128) / 5)) * 5), ((round ((frequency / 64) / 5)) * 5), ((round ((frequency / 32) / 5)) * 5), ((round ((frequency / 16) / 5)) * 5), ((round ((frequency / 8) / 5)) * 5), ((round ((frequency / 4) / 5)) * 5), ((round ((frequency / 2) / 5)) * 5), ((round (frequency / 5)) * 5), ((round ((frequency * 2) / 5)) * 5), ((round ((frequency * 4) / 5)) * 5), ((round ((frequency * 8) / 5)) * 5), ((round ((frequency * 16) / 5)) * 5), ((round ((frequency * 32) / 5)) * 5), ((round ((frequency * 64) / 5)) * 5), ((round ((frequency * 128) / 5)) * 5), ((round ((frequency * 256) / 5)) * 5), ((round ((frequency * 512) / 5)) * 5)}