Reading headers of audio files

Hi all,

is it possible to use applescript to read the headers of WAV or AIFF files (ort any audio file) and spit them out in a format of my choosing?

Any idea where I would start to achieve that?

MY goal is to read the metadata contained in broadcast wave files.

Cheers

Simon

AppleScript can read any kind of file, but you are who must parse the data (eg, if you know that first four characters of a WAV file header are a flag of kind-of-wave, you can read the header and assign first four characters a variable name). For example, in a shockwave flash file you can find in the header: signature (3 bytes), version (1 byte), file length (4 bytes), etc.
But it doesn’t exist a command called “readWAVHeader” which returns the info. Maybe somebody already wrote such an utility and you can use it (a command-line tool, or a library as read-swf-header or image-info).
To read data from a file:

read file "path:to:file" for 1000 --> returns first 1000 bytes

Thanks JJ,

I have successfully read the header but there are some format specific things that don’t make sense. If I look in HexEdit at the WAV file I can see the different parts of the header but there are a few sections that are in the specification which are not ASCII text, for instance: the fmt chunk which tells the file format. I can see the letters fmt but the ASCII after it is all messed up…
Thanks JJ,

I have successfully read the header but there are some format specific things that don’t make sense. If I look in HexEdit at the WAV file I can see the different parts of the header but there are a few sections that are in the specification which are not ASCII text, for instance: the fmt chunk which tells the file format. I can see the letters fmt but the ASCII after it is all messed up…

fmt ĪÄ2cue 4

The hex for that…
00 00 00 00 00 00 00 00 66 6D 74 20 10 00 00 00
01 00 01 00 80 BB 00 00 80 32 02 00 03 00 18 00
63 75 65 20 34 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00

The Broadcast Wave (BWF) spec calls for the following for the WAVE format chunk…


<fmt-ck> -> fmt( <common-fields> <format-specific-fields> )
<common-fields> ->
  struct{
      WORD wFormatTag;             // Format category. Either a value of (0 x 0001) for PCM or (0 x 0050) for MPEG-1
      WORD nChannels;                // Number of channels. 1 or 2
      DWORD nSamplesPerSec;   // Sampling rate. 22050, 44100 or 48000
      DWORD nAvgBytesPerSec; // For buffer estimation.  Equal to (nChannels x nSamplesPerSec x nBitsPerSample) / 8
      WORD nBlockAlign;            // Data block size.  Equal to (nChannels x nBitsPerSample) / 8
  }
<PCM-format-specific> ->
  struct{
      WORD nBitsPerSample;     // Sample size: Either 8, 16, 24 or 32
  }




But I don't see how that corresponds with the HEX above.  I don't get what value is represented by (0x0001) etc as well.  Anyone care to comment?

Hmmm… I’m not an expert, but seems there is something wrong here (?)
I think that a WORD is a double-byte (16 bit = 2 byte = 4 hex chars).
If, for the analyzed file, we have 1 single channel, then the “common-fields” start at byte 16, where we can see the first “00 001” (format PCM), followed by “00 001” (number of channels).
The double-word seems to be truncated or something. If the analyzed file is 48000 of sample rate, expressed in decimal notation, we’re talking about the hex value “BB 80”, while we have here “00 80 BB 00”.

Unless this is a LE (little-endian) file, where all bytes are swapped, then where we read (in the second line)
01 00 01 00 80 BB 00 00 80 32 02 00 03 00 18 00
we should understand
00 01 00 01 BB 80 00 00 32 80 00 02 00 03 00 18
and, for any reason, desestimate the two final zeroes (“00 00”) in the first DWORD. So, in the first DWORD, we find “BB 80 00 00”. But perhaps we must also truncate this one and read it as “00 00 BB 80”, which returns the value we need: 48000.

The last value in the second line (if LE) would be “00 18”, and it would match 24 decimal, a valid value in the nBitsPerSample field. The previous one (nAvgBytesPerSec) would be also valid (“00 03”), defined as “(nChannels x nBitsPerSample) / 8” → ( 1 x 24) / 8 = 3

I don’t know why I find this funny…

Yes, it is little endian - being a Windows based format.

Thanks for the explaination. I’m a little lost with the conversions between the different formats. Is there a good online reference and some conversion tools/formulas that could help me?

Thank you so much for your help.

Simon

Sorry, all I know is allways guess-and-try :oops: