The Finite Capacity of Short-Term Memory or Why ABX Tests in HiFi Are Silly

Proponents of ABX testing in audio suggest, or insist depending on the person, that knowing what we are listening to when making comparisons makes any results invalid due to bias. In order to remove this perceptual bias, we must remove the knowledge of what we’re listening to. Thus, the ABX test.

How does an ABX test work? From Wikipedia:

A subject is presented with two known samples (sample A, the first reference, and sample B, the second reference) followed by one unknown sample X that is randomly selected from either A or B. The subject is then required to identify X as either A or B.

Simple, right? Listen, switch, listen, switch, listen, switch, and so on. If you can’t reliably identify X as either A or B, a 95% confidence level is considered statistically significant, then you may as well buy the cheaper one. [footnote 1] Simple, right?

ABX tests in audio are simply silly because short-term memory, which is what we’re actually testing in an ABX test, is short—it has a finite capacity, about ~18 seconds, depending on the person taking the test and the material used. [footnote 2] At best, ABX test results will tell us something about the abilities of each tester under the specific conditions of the test. Nothing more, nothing less. [footnote 3]

Of course you can get better at taking ABX tests with time and experience, which is all well and good if that’s your goal—to become proficient at taking ABX tests. If you want to decide what hifi to buy, ABX tests are nothing more than a distraction.


1. The ABX test, first developed by Bell Labs in 1950, was initially suggested as a way to test the limits of auditory perception in various types of tests including “thresholds of acuity, masking tests, difference limens, etc.”. It has since become a symbolic tool used in support of a belief and argument against people who buy their hifi the same way they make decisions about everything else in life.

2. “…the ability to recall words in order depends on a number of characteristics of these words: fewer words can be recalled when the words have longer spoken duration; this is known as the word-length effect, or when their speech sounds are similar to each other; this is called the phonological similarity effect. More words can be recalled when the words are highly familiar or occur frequently in the language. Recall performance is also better when all of the words in a list are taken from a single semantic category (such as games) than when the words are taken from different categories.” see Short-term memory

3. “Baddeley used this finding to postulate that one component of his model of working memory, the phonological loop, is capable of holding around 2 seconds of sound. ” see The Magical Number Seven, Plus or Minus Two