Recent Changes - Search:

Recently Written

News

edit SideBar

Audio

For audio players, see Mediaplayers .

For video, see Video .

For images, see GraphMediaTech .

For general info on data compression, see Compress .

1.  About digital audio

Digital audio became publicly useful in 1982 when the CD (“Compact Disc”) came to the market. But it took over 10 further years before it became possible and feasible to play, record and edit audio on computers. The audio format on the CD is stereo, 16-bit, and 44.1 KHz sampling frequency, and is NOT compressed. This became also the most common format for audio files on computers. The lack of storage space on computers of that era made the use of lossy codecs inevitable, the MP3 codec (developed for MPEG-1 video system) of that era was pretty bad and caused substantial quality damage. Later the codec technology much improved, both of MP3 and other codecs, but storage space also much increased making lossless codecs and wider usage of uncompressed files possible. Recently the sampling frequency of 48 KHz is seen as preferred to 44.1 KHz. Files with 48 KHz and 16-bit are considered as ultimately sufficient for archiving and listening, more “generous” storage (32-bit integers, 32-bit floats, 96 KHz or 192 KHz) does not make sense for those purposes, but can make sense as an intermediate format for audio processing.

2.  Gain, loudness and normalization

A problem of digital audio is that putting files from multiple sources together (onto a playlist or into a portable player) will usually result in inconsistent loudness. The process of adjusting loudness is called “normalization”. Unfortunately for years the common way to do this, both by professional audio content vendors and common audio programs including SOX (see below) was the infamous “peak normalization”. This means that the peak positive and peak negative value are evaluated and subsequently the signal is amplified so that at least one of them touches the limit (that is +32′767 or −32′768 with 16-bit samples). While this seems to be smart, it is actually NOT, neither “as-is” nor with the additional 3 dB reserve as commonly suggested. Peak normalization will usually NOT make files from various sources sound equally loudly. A “side effect” of the peak normalization is that a single “broken” sample (originating from whatever technical problem during recording, decoding, …) touching the limit will make the normalization completely fail, even if the loudness is actually very low. Even without “obviously broken” samples the results are not optimal. The more useful way to do is to evaluate the loudness and apply the “Replay Gain” technology. This means that the “true loudness” is being evaluated by an algorithm trying to mimic the human ear, while this cannot be done perfectly, as human ears do both vary and degrade with age, this gives much better results than just dumb search for the highest sample.

3.  Codecs and file formats

3.1  WAV

The (usually) uncompressed audio file format. Supports different sampling frequencies, bit resolutions & numbers of channels. You need cca 1/2 GiO of space for 1 hour of sound of standard CD quality (2 channels, 16 bit per channel, 44.1 KHz).

3.2  MP3

The first “useful” and up to now most popular audio lossy compression algorithm and file format, designed originally as part of the MPEG video/multimedia “standard” (MPEG-1-layer-3). MP3 files contain just “raw” compressed audio data, without any container (see Video ), and no video of course. Many people don’t know anything besides MP3 and consider MP3 as synonym of “sound” :-D . It has open source implementations, but is considered as patented. There are several versions and extensions of it, like “MP3 Pro”, “VBR”, as well as some DRM systems (“secure MP3″). Several good reasons exist for using OGG VORBIS instead. If you really have to use MP3, the open source compressor is called “LAME” and does have DOS ports of versions 3.92 , 3.97b2 (compiled by Blair ) , 3.97 (compiled by FloX ) , 3.98.4 (compiled by Robert Riebisch ) , and 3.99.5 (preferred, compiled by RayeR). Those binaries do offer both compressing (native LAME business) and decompressing (using external “MPG123″ library) in one executable.

History:
LAME 3.100        not yet released   (many bugfixes)
LAME 3.99.5       February 28 2012   (one bugfix)
LAME 3.99.4       January 25 2012
...
LAME 3.98.4       March 22 2010      (one bugfix)
LAME 3.98.3       February 27 2010   (many bugfixes)
LAME 3.98.2       September 22 2008  (one bugfix)
LAME 3.98.1       September 21 2008
...
LAME 3.97         September 24 2006  ("3.97 beta 3 becomes 3.97")
LAME 3.97 beta 3  August 19 2006     (one bugfix)
LAME 3.97 beta 2  November 26 2005   (many bugfixes)
...
LAME 3.92         2002-Apr-14
...
LAME 3.81beta     2000-May-08        ("all ISO code removed")
...
LAME 3.0 May      1999-Oct
...
October 1 1998        ("Updated web page and released LAME v1.0")
Up to September 1998  ("Working on the 8hz source code")

3.3  AAC

The “second generation” proprietary audio codec, technically superior to MP3, but less popular in audio-only files, occurs mostly in videos inside the MPEG-4 container together with H264 video, but audio-only files use the same container too, there is just no video included. Some open source implementations also exist. Has an additional “unofficial” name - “MP4″ .

3.4  [OGG] Vorbis (and Speex)

Lossy algorithm, replacement for MP3, “second generation” codec. It compresses better (better quality with same file size or smaller file with same quality), is free, open source (liberal BSD license) and unpatented and supports no DRM. Commandline compressors and decompressors for Win32 are available, although not “official” and not easy to find (and have very big sizes), and do work in DOS using HX-DOS. MPXPLAY supports it also, unlike, and this is the only disadvantage, many cheap portable players that don’t (note that some players exist supporting it sufficiently well as “undocumented feature”). There exist multiple implementations of the encoder:

  • Official Xiph libvorbis
  • aoTuV Vorbis (derivative of Xiph libvorbis, additional size vs quality optimizations, by Aoyumi )
  • FFvorbis (part of FFMPEG library, very inferior, when using FFPMEG binaries, use “libvorbis” referring to the “official” Xiph libvorbis also included in most cases, instead of “vorbis” referring to FFvorbis)

There exists/existed also a Speex codec - a niche product intended for speech at very low bitrates (complement to Vorbis used for high quality (actually music) audio), but it’s now officially obsolete in favor of OPUS.

3.5  OPUS

In 2012, Xiph together with Mozilla introduced OPUS, the “third generation” lossy audio codec, the only one at that time. Tests showed it to be superior to “second generation” codecs like Vorbis or AAC. It includes 2 technologies - the former CELT codec of Xiph, and the former SILK codec, used in Skype. As an additional benefit, it offers low latency, so it’s suitable for communication (phone) too, while previously playing (music) audio files and communication were 2 separate niches. OPUS is specified by RFC6716 “Definition of the Opus Audio Codec” from 2012-Sep, see below. There are Win32 binaries of encoder and decoder working in DOS, and MPXPLAY 1.60 supports (DLL needed) this format too.

3.6  FLAC, WAVPACK and TAK

3 very similar products having same goal: lossless audio compression. Audio files are badly compressible, classical archivers can achieve compression factors cca 1.05… 1.4 . FLAC, WAVPACK and TAK are optimized for audio and compress better, by factor cca 1.1 … 1.8 . No miracles either - bad compressibility is a given fact and it is simply impossible to compress audio significantly more without loss. Both can vary the compression effort (higher effort results in a drastic slow down and marginally better compression) and uncompress the file to be byte-identical to the original one. FLAC and WAVPACK are available under a BSD license, TAK is closed source but is/was intended to be “opened soon” by the author Thomas Becker, just it didn’t seem to happen, so FFMPEG developers reversed it and added a TAK decoder (TAK 2.xx only) in FFMPEG 1.1. WAVPACK is slightly slower and compresses marginally more than FLAC. TAK is the newest and should offer a compression at least egalizing WAVPACK with speed of FLAC. All 3 do offer Win32 console compressing and decompressing apps, working well in DOS using HX-DOS Extender. Further, some external developers ported some versions of FLAC (now obsolete) and WAVPACK (still quite new) to DOS. MPXPLAY can play FLAC and WAVPACK, no TAK so far.

TAK history:

  • 1997 (??) development started in “secret”
  • 2006-Apr-01 announced at HydrogenAudio forum (no release yet, this was NOT an April fool, rather bad timing)
  • 2007-Jan-23 - version 1.0
  • 2010-Jan-07 - version 2.0.0 (marginally better compression, breaking compatibility with TAK 1.xx)
  • 2011-Jul-08 - version 2.2.0
  • 2012-Oct FFMPEG developers announced reversing TAK (2.xx only), released code, Thomas Becker was “shocked” but not “very angry”
  • 2013-Jan-07 - FFMPEG 1.1 released with TAK 2.xx decoding support
  • 2013-Jun-18 - version 2.3.0

3.7  WMA

Micro$ofts closed proprietary codec for lossy audio compression, includes optional DRM support. A (potentially illegal ??) open source implementation of it allowing decompression (now also creation ??) of WMA files (only those without DRM) is included in the FFMPEG library and “leaked” into many mediaplayers incl. MPXPLAY for DOS (!!).

4.  Programs

4.1  SOX

Commandline audio converter / editor, in development since 1991.

14.4.1 seems to mostly work in DOS with HX, long ago there used to be 16-bit DOS versions, also some older 32-bit DGJPP ports do exist.

Features:
  • native support of many raw, uncompressed, and obscure formats
  • popular compressed formats (Vorbis, MP3, AAC, FLAC, …) may be supported using external libraries, depends from how binary is compiled (no support, or huge binary, or many DLL’s)
  • change sampling frequency
  • low-pass and high-pass
  • cut / trim
  • reverse
  • gain / change volume / peak normalization (see above)
  • compand / compress dynamic range
  • dumb merge and “smart/smooth” merge (untested)

Running issues with SOX version 14.4.1 official Win32 binary:

  • works with HX, no missing imports
  • needs at least Pentium 3 processor to run
  • depends from just 1 unconditionally loaded DLL “zlib1.dll” included in the package
  • on not all but many operations SOX needs temporary files
  • temporary files can become huge (2 times size of incoming WAV file)
  • option “--temp” can be used to specify temporary directory, otherwise the variables TEMP and TMP are used, they should NOT point into a main directory and NOT end with a slash, thus “E:\” is bad while “E:\CRAP” is good (the directory must exist)
  • unfortunately the temporary files use very long names with double extensions, so they will not work on plain DOS, and UI21DEB is also insufficient to fix this, thus DOSLFN is needed

One of the flaws of SOX is that does not provide loudness normalization, in the manual it more or less “advertizes” the peak normalization with 3 dB reserve, that is what SOX can perform automatically. Like many other audio programs SOX can apply any gain specified in the commandline, but (as also noted in the manual) not find out what gain needs to be applied in order to normalize the loudness.

4.2  AUDACITY

Not (yet ??) usable in DOS …


5.  See also

Edit - History - Print - Recent Changes - Search
Page last modified on August 14, 2016, at 10:07 AM