Music and Language

“Music is a conversation” — my violin teacher told me this at every lesson during my school years. Is it true?

June Y.
9 min readMay 6, 2022
In classical music, the musicians always sit facing each other instead of the audience, as if having a conversation, or perhaps even arguing. Source: Hiroyuki Ito/The New York Times

Every day, musicians around the world talk about “call and response,” “phrases,” and “sentences,” borrowing all sorts of terms from verbal grammar in order to discuss their music. This isn’t just pulled from some small world of musical jargon, either — I’m sure we’ve all heard something along the lines of, “music is a universal language.” Playing music as if engaging in dialogue has been a favorite analogy of music educators for ages.

It’s easy to see why this comparison is so commonplace: after all, both music and language can loosely be reduced to the act of creating specific sounds at specific points in time. In that sense, it seems somewhat obvious that people might find some aesthetic gratification in equivalating these concepts through this connection alone. But as good as that intuition may be, is it possible to find a real scientific basis to treat music as a language? In other words, are the linguistic phenomena that we observe in music just a reflection of our pattern-seeking apophenia, or does the human brain really process music in the same way that it does language?

Background: The Syntax of Music

Before we delve into its similarities with spoken language, here’s a tiny, simplified overview on the mechanics of music as a form of communication.

Note: when I talk about “music” in this article, I refer primarily to music of the Common Practice Period, or European art music between the 17th and 20th centuries, largely due to my familiarity with that genre and the type of music commonly researched in studies. However, the discussion on the whole is not necessarily specific to this form of music.

Music, at its lowest level, is made up of individual notes, or pitches. When two or more of these notes are played simultaneously, the resulting collection of pitches forms a chord, much in the same way that sounds come together to form words in language. Furthermore, each of these chords can be put into one of three groups based on its harmonic function: tonic, subdominant, or dominant. Consider these the nouns, verbs, and adjectives of our “language.”

Two syntactic trees; the left shows a musical progression broken down into chords, and the right shows an English sentence broken down into words. Notice the similarities. Sources: Rohrmeier, M. (2007), University of North Georgia Press

Now, just as pulling random words from a hat will most likely produce gibberish, these chords must come in a logical order to be comprehensible to a listener. One of the foundational rules of Western music, for instance, is that a dominant chord must always be followed by a tonic chord (hang on to this — it’ll be important in a bit). In fact, whether they are aware of these rules or not, many people are able to hear the difference when chords come in an order that doesn’t follow the normal structure. This is simply because these syntactic rules are so prevalent in the music we hear in our daily lives that they have become a part of our subconscious. Try listening to these two examples:

Here, the rule is not followed, and so it may sound like we didn’t arrive quite where we were expecting, almost like a sentence trailing off before its conclusion.
When the rule is followed, the musical idea sounds complete and assured.

Sentence Processing

Now that we understand a bit about how music works, let’s return to some linguistics. More specifically, we will look at how the human brain processes long, complex, sentences, and note the evidence for similarities with music.

According to Hartsuiker and Barthuysen (2006), the complexity of sentences that humans are able to process is restricted by our working memory. Suppose we have a simple sentence: “the dog ran.” Now, add in some extra details about the dog: “the dog the neighbor owned ran.” And now about the neighbor: “the dog the neighbor my cat hated owned ran.” This sort of recursion could be done to no end; in any case, the sentence quickly starts to become harder and harder to understand as the number of things to keep track of increases. Using various sentence formation tests based on this concept, Hartsuiker’s study tested these facilities in the brain and concluded that the intelligibility of such sentences is limited by the nature of our verbal working memory as a finite resource.

Now, if it is true that musical ideas are indeed processed by the same mechanisms of the brain that process verbal sentences, that would mean that this resource of working memory is also accessed by functions of music processing in the brain. Furthermore, Hartsuiker and Barthuysen established that it is possible to “overload” this memory, restricting the complexity of sentences that can be parsed by the brain. By extension, then, it should be possible to overload linguistic memory with musical information, and vice versa, to the point where ability to process one or the other is reduced.

To investigate this possibility, Hoch et al. (2011) conducted a study testing both faculties against each other. In this series of experiments, Hoch et al. presented the subjects with a sentence-processing lexical decision task, in which a sequence of words was read to them one at a time in slow succession, the last of which would either be syntactically expected or unexpected. As soon as the sentence was completed, the subject was asked to identify whether the last word was expected or unexpected as quickly as possible. While this task was being performed, a sequence of chords was also played in the background, one for each word. Likewise, this sequence was one of two types: expected, moving at the end from a dominant to a tonic (recall our rule that a dominant chord must move to a tonic chord!), or unexpected, where the chord moved to a subdominant instead.

Results of the experiment, showing much greater response times for syntactically expected words when presented with the subdominant rather than the tonic. The syntactically unexpected words remain unaffected. Source: Hoch et. al (2011)

Through this study, Hoch et al. found that an unexpected chord sequence hindered the accuracy and speed of the subjects while completing the lexical decision task, specifically when identifying syntactically expected words. In other words, if an unexpected, syntactically incorrect chord concluded the musical idea, this added significant interference to the linguistic processing and slowed down the subjects in evaluating the syntactical correctness of the lexical stimulus. Using these results, the researchers concluded that the neural resources of musical and linguistic processing strongly overlap.

While this is a particularly interesting case of a parallel between musical and linguistic processing, this isn’t the first time that researchers have arrived at this conclusion. For instance, Poulin-Charronnat et al. (2005) showed a similar overlap between musical and linguistic processing, except with semantic stimuli rather than syntactic ones. Moreover, a 2008 study by Steinbeis et al. researched the relationships between more complex, elaborate chord patterns found in classical music and several linguistic stimuli, directly measuring the responses of different parts of the brain, and ultimately coming to the same conclusion. Koelsch (2011) provides an in-depth neural model of musical and language processing, pointing at specific neural generators which are activated in common to both processes.

Music in Practical Language

So far, we have found that music and language really do appear to be processed by similar cognitive regions of the brain. Now, let’s take a look at an example of research which highlights how music can fit into the larger landscape of language, as well as reaffirming our previous findings.

Although languages differ in many ways, one of the lowest-level differences between two languages is the set of sounds that they use. More precisely, a language may group together sets of sounds it considers similar into single phonemes — units of sound — that serve one purpose linguistically. For instance, whereas the sounds [p] (unaspirated, as in spun) and [pʰ] (aspirated, as in pun) are different phonemes in Hindi (/kapi/ and /kapʰi/ are distinct words), they are simply two allophones of the same phoneme in English — that is, the difference between these two has no distinctive role in distinguishing one word from another, and there does not exist a word pair that differs by these sounds alone.

According to Werker (1995), most English speakers have lost or have a reduced ability to differentiate between sounds such as [p] and [pʰ] due to this lack of functional difference between them. This is true across the globe; language speakers will often only retain the ability to differentiate between sounds that are useful to them in language, and will otherwise group them as allophones of sounds that do exist in their language. So then, this begs the question: what about musicians?

Ott et al. (2011) conducted a study on the differences between musicians and non-musicians in their ability to distinguish various sounds by taking EEG (electroencephalogram) images of their brains while responding to various voiced and unvoiced sound stimuli, measuring their time and accuracy in correctly identifying the stimuli.

The musicians are the blue bars in figure A. Source: Ott et al.

As visualized above, and perhaps unsurprisingly, the musicians were significantly faster than non-musicians in identifying every type of sound. However, according to the researchers, the “major finding of our study is that musicians process unvoiced stimuli differently than non-musicians,” indicating the improved accuracy when processing unvoiced stimuli as compared to the control group. This may show that musicians have in their phonological vocabulary sounds that are not typically used in human language, or that they are able to distinguish between finer granularities of unvoiced sound which would be considered allophonic or noise in the native language. In both cases, the conclusion is that an expertise in music produces a real change in a very basic level of sound processing in the brain in ways that would seem to mirror the acquisition of another language.

Further Reading and Concluding Remarks

For further reading, I would like to call attention to a topic which I found highly interesting but was slightly too tangential to cover above. The phenomenon of absolute pitch, also known as “perfect pitch,” is the ability to recognize and name musical notes without a reference pitch. This ability, which can only be acquired during early childhood, is quite rare, with frequency estimates going low as 1/10,000. According to Deutsch (2002), however, it has been reported that speakers of tonal languages such as Mandarin and Vietnamese possess this ability at anywhere between 4 to 10 times the likelihood of the general population, which serves as further evidence for shared faculties between music and language, this time in a developmental brain.

In any case, there still remains much work to be done in the research of the exact linguistic properties of music and how the brain responds to these structural similarities. Ironically, thanks to the very similarities in the physicalities of music and language that prompt this discussion in the first place, it is more difficult to completely separate these variables from each other and observe them in independence, though brain imaging has made strides of progress in that area. Here in this article, I have only posed the arguments highlighting the overlap between music and language, but it is worth noting that doubts have been posed against the neurological sharing of musical and language processing since Bever and Chiarello in 1974 and earlier, and the debate continues to this day.

A beautiful piece by Antonín Dvořák. The dialogue between the violin and the piano continues throughout the whole piece. Performed by Bohuslav Matoušek and Petr Adamec

References

Bacain2357. (2018, September). A linguist’s tree of knowledge: tree diagrams. University of North Georgia Press. https://blog.ung.edu/press/a-linguists-tree-of-knowledge/

Bever, T. G., and Chiarello, R. J. (1974). Cerebral dominance in musicians and nonmusicians. Science 185, 537–539.

Deutsch, D. (2002). The puzzle of absolute pitch. Current Directions in Psychological Science, doi:10.1111/1467–8721.00200

Hartsuiker, R. & Barkhuysen, P. (2006). Language production and working memory: The case of subject-verb agreement, Language and Cognitive Processes, 21:1–3, 181–204, doi: 10.1080/01690960400002117

Hoch, L., Poulin-Charronnat, B., and Tillmann, B. (2011). The influence of task-irrelevant music on language processing: syntactic and semantic structures. Front. Psychol. 2:112. doi:10.3389/fpsyg.2011.00112

Ito, H. (2014). Beethoven String Quartet Cycle at Lincoln Center. The New York Times. https://www.nytimes.com/2016/01/31/arts/music/beethoven-string-quartet-cycle-at-lincoln-center.html

Jäncke, L (2012). The relationship between music and language. Front. Psychology 3:123. doi: 10.3389/fpsyg.2012.00123

Koelsch, S. (2011). Toward a neural basis of music perception — a review and updated model. Front. Psychol. 2:110. doi: 10.3389/fpsyg.2011.00110

Ott, C. G., Langer, N., Oechslin, M., Meyer, M., and Jancke, L. (2011). Processing of voiced and unvoiced acoustic stimuli in musicians. Front. Psychol. 2:195. doi: 10.3389/fpsyg.2011.00195

Poulin-Charronnat, B., Bigand, E., Madurell, F., and Peereman, R. (2005). Musical structure modulates semantic priming in vocal music. Cognition 94, B67–B78.

Rohrmeier, M. (2007). “A generative grammar approach to diatonic harmonic structure,” in Proceedings of the 4th Sound and Music Computing Conference, Lefkada, 97–100.

Steinbeis, N., and Koelsch, S. (2008). Shared neural resources between music and language indicate semantic processing of musical tension-resolution patterns. Cereb. Cortex 18, 1169–1178.

Werker, J. F. (1995). Exploring developmental changes in cross-language speech perception. In L. R. Gleitman & M. Liberman (Eds.), Language: An invitation to cognitive science (pp. 87–106). The MIT Press.

--

--