It’s easy to think about music as just a sequence of sounds – recorded and encoded in a Spotify stream, these days, but still: an acoustic phenomenon that we respond to because of how it sounds. The source of music’s power, according to this account, lies in the notes themselves. To pick apart how music affects us would be a matter of analysing the notes and our responses to them: in come notes, out tumbles our perception of music. How does Leonard Cohen’s Hallelujah work its magic? Simple: the fourth, the fifth, the minor fall, the major lift…
Yet thinking about music in this way – as sound, notes and responses to notes, kept separate from the rest of human experience – relegates music to a special, inscrutable sphere accessible only to the initiated. Notes, after all, are things that most people feel insecure about singing, and even less sure about reading. The vision of an isolated note-calculator in the brain, taking sound as input and producing musical perceptions as output, consigns music to a kind of mental silo.
But how could a cognitive capacity so removed from the rest of human experience have possibly evolved independently? And why would something so rarified generate such powerful emotions and memories for so many of us?
In fact, the past few decades of work in the cognitive sciences of music have demonstrated with increasing persuasiveness that the human capacity for music is not cordoned off from the rest of the mind. On the contrary, music perception is deeply interwoven with other perceptual systems, making music less a matter of notes, the province of theorists and professional musicians, and more a matter of fundamental human experience.
Brain imaging produces a particularly clear picture of this interconnectedness. When people listen to music, no single ‘music centre’ lights up. Instead, a widely distributed network activates, including areas devoted to vision, motor control, emotion, speech, memory and planning. Far from revealing an isolated, music-specific area, the most sophisticated technology we have available to peer inside the brain suggests that listening to music calls on a broad range of faculties, testifying to how deeply its perception is interwoven with other aspects of human experience. Beyond just what we hear, what we see, what we expect, how we move, and the sum of our life experiences all contribute to how we experience music.
Subscribe to Aeon’s Newsletter
If you close your eyes, you might be able to picture a highly expressive musical performance: you might see, for instance, a mouth open wide, a torso swaying, and arms lifting a guitar high into the air. Once you start picturing this expressive display, it’s easy to start hearing the sounds it might produce. In fact, it might be difficult to picture these movements without also imagining the sound.
Or you could look – with the volume muted – at two performances of the same piano sonata on YouTube, one by an artist who gesticulates and makes emotional facial expressions, and the other by a tight-lipped pianist who sits rigid and unmoving at the keyboard. Despite the fact that the only information you’re receiving is visual, you’ll likely imagine very different sounds: from the first pianist, highly expressive fluctuations in dynamics and timing, and from the second, more straightforward and uninflected progressions.
Could it be that visual information actually affects the perception of musical sound, and contributes substantially to the overall experience of a performance? Numerous studies have attempted to address this question. In one approach, the psychologist Bradley Vines at McGill University in Canada and colleagues video-recorded performances intended to be highly expressive as well as ‘deadpan’ performances, in which performers are instructed to play with as little expressivity as possible. Then the researchers presented these recordings to the participants, either showing them just the video with no sound, or playing them just the audio with no video, or playing them the full audiovisual recording – or, in a particularly sneaky twist, playing them a hybrid video, in which the video from the expressive performance was paired with the audio from the deadpan performance, and vice versa.
It turns out that participants tend to describe as more expressive and emotional whichever performance is paired with the more expressive video – rather than the recording with the more expressive sound. In a separate experiment, the psychologist Chia-Jung Tsay at University College London showed that people predicted the winners of music competitions more successfully when they watched silent videos of their performances than when they merely heard the performances, or watched the video with the sound on.
Pairing minor (sad) audio with major (happy) video leads to the minor music being rated as happier
Music, it seems, is a highly multimodal phenomenon. The movements that produce the sound contribute essentially, not just peripherally, to our experience of it – and the visual input can sometimes outweigh the influence of the sound itself.
Visual information can convey not only information about a performance’s emotional content, but also about its basic structural characteristics. Work by the psychologists Bill Thompson at Macquarie University in Sydney and Frank Russo at Ryerson University in Toronto showed that people could judge the size of an interval being sung even when they couldn’t hear it – merely by watching facial expressions and head movements. When video of a person singing a longer interval was crossed with audio from a shorter one, people actually heard the interval as longer. Similarly, when Michael Schutz and Scott Lipscomb, then both at Northwestern University in Illinois, crossed video of a percussionist playing a long note with audio from a short note, people actually heard the note’s duration as longer.
Multisensory integration at this basic level feeds into some of the higher-level effects of vision on perceived emotion. For example, pairing audio of a sung minor interval, typically heard as sad, with video footage of someone singing a major interval, typically heard as happy, leads to the minor interval being rated as happier.
A musical experience is more than an audiovisual signal. Maybe you’re trying out a new band because your best friend recommended it, or because you’re doing your parent a favour. Maybe you’re experiencing a concert in a gorgeous hall with a blissed-out audience, or maybe you’ve wandered into a forlorn venue with a smattering of bored-looking folks, all of whom seem to have positioned themselves as far from the stage as possible. These situations elicit markedly different sets of expectations. The information and inferences brought to the concert can make or break it before it even starts.
Joshua Bell is a star violinist who plays at the world’s great concert halls. People regularly pay more than $100 per ticket to hear him perform. Everything about the setting of a typical concert implies how worthy the music is of a listener’s full attention: the grand spaces with far-away ceilings, the hush among the thousand attendees, the elevation of the stage itself. In 2007, a reporter from theWashington Post had an idea for a social experiment: what would happen if this world-renowned violinist performed incognito in the city’s subway? Surely the exquisiteness of his sound would lure morning commuters out of their morning routine and into a rhapsodic listening experience.
Instead, across the 35 minutes that he performed the music of Bach, only seven people stopped for any length of time. Passers-by left a total of $32 and, after the last note sounded, there was no applause – only the continued rustle of people hurrying to their trains. Commentators have interpreted this anecdote as emblematic of many things: the time pressures faced by urban commuters, the daily grind’s power to overshadow potentially meaningful moments, or the preciousness of childhood (several children stopped to listen, only to be pulled away by their parents). But just as significantly, it could suggest that the immense power of Bell’s violin-playing does not lie exclusively in the sounds that he’s producing. Without overt or covert signalling that prepared them to have a significant aesthetic experience, listeners did not activate the filters necessary to absorb the aspects of his sound that, in other circumstances, might lead to rhapsodic experiences. Even musicianship of the highest level is susceptible to these framing effects. The sound just isn’t enough.
People liked the music more and were more moved by it when they thought it had been written for a happy reason
Other studies also suggest a powerful role for context in the experience of music. In 2016, with my colleague Carolyn Kroger at the University of Arkansas, we exposed participants to pairs of performances of the same excerpt, but told them that one was performed by a world-renowned professional pianist and the other by a conservatory student: people consistently preferred the professional performance – whether they were listening to the professional, to the student, or had in fact just heard the exact same sound played twice. And, in another factor unrelated to the sound itself, listeners tended to show a preference for the second excerpt that they heard in the pair. When these two factors coincided – when the second performance was also primed as professional – their tendency to prefer it was especially strong. My own subsequent neuroimaging work using the same paradigm revealed that reward circuitry was activated in response to the professional prime, and persisted throughout the duration of the excerpt; this finding is in line with previous neuroimaging studies that demonstrated the sensitivity of the reward network to contextual information, affecting or even improving the pleasantness of a sensual experience.
It’s not only our sense of the quality of a performance that is manipulable by extrinsic information; our sense of its expressive content can also vary. In a recent study, we told people that we had special information about the musical excerpts that they were going to hear: in particular, we knew something about the composer’s intent when writing it. Unbeknown to the participants, we created the intent descriptions so that some were highly positive, some highly negative, and some neutral. For example, we could say that a composer wrote the piece to celebrate the wedding of a dear friend, to mourn the loss of a friend, or to fulfil a commission. We scrambled the description-excerpt pairings so that the same excerpts were matched with different descriptions for different participants. In each trial, participants read the composer-intent description, listened to the excerpt, and answered questions about it.
When told that the excerpt had been written for some positive reason, people heard the music as happier, but when told that the excerpt had been written in a negative circumstance, they heard it as sadder. Recasting the emotional tenor of an excerpt had important consequences for the listeners’ experience of it. People liked the excerpts more and were more moved by them when they thought they had been written for a happy reason (intriguingly, another part of the same study showed that people liked and were more moved by poetry when they thought it had been written for a sad reason). The social and communicative context within which a performance occurs – rudimentarily approximated-by-intent descriptions in this study – can imbue the same sounds with very different meanings.
The right music can get a roomfull of people dancing. Even people at classical concerts that discourage overt movement sometimes find it irresistible to tap a finger or foot. Neuroimaging has revealed that passive music-listening can activate the motor system. This intertwining of music and movement is a deep and widespread phenomenon, prevalent in cultures throughout the world. Infants’ first musical experiences often involve being rocked as they’re sung to. The interconnection means not only that what we hear can influence how we move, but also that how we move can influence what we hear.
To investigate this influence, the psychologists Jessica Phillips-Silver and Laurel Trainor at McMaster University in Ontario bounced babies either every two or every three beats while listening to an ambiguous musical excerpt that was capable of being understood as characterised by perceived accents every two or three beats. During this exposure phase, babies were hearing the same music, but some of them were being moved in a duple pattern (every two beats, or a march) and some of them were being moved in a triple pattern (every three beats, or a waltz). In a later test phase, babies were presented with versions of the excerpt featuring added accents every two or every three beats, translating the emphasis from the kinaesthetic to the auditory domain. They listened longer to the version that matched the bouncing pattern to which they had been exposed – babies who had been bounced every two beats preferred the version with a clear auditory duple meter, and babies who had been bounced every three beats preferred the version with the triple meter. To put it another way, these infants transferred the patterns they had learned kinaesthetically, through movement, to the patterns they were experiencing auditorily, through sound. What they perceived in the sound was framed by the way they had moved.
The findings paint an embodied picture of music-listening: the way you physically interact with music matters
Testing whether this transfer from movement to sound occurs in adults required a few modifications to the study design – it’s not as easy to pick up adults and bounce them. Instead, adults were taught how to bend their knees every two or three beats as a musical excerpt played. And rather than devising a listening-time paradigm to infer aspects of perception from preverbal infants, researchers simply asked participants which of two excerpts sounded more similar to the one in the exposure phase. Participants chose from versions of the excerpt to which auditory accents had been added every two or three beats. Mirroring results with the infants, the adults judged the version to be most similar when it featured the accent pattern that matched the way they’d moved. The effect persisted even when participants were blindfolded while moving, demonstrating that perception could transfer from movement to sound even in the absence of a mediating visual influence. Movements much subtler than full-body bounding can also influence auditory perception. Participants asked to detect target tones occurring on the beat from within a series of distractor tones performed better when they tapped a finger on a noiseless pad than when they listened without tapping.
Together, these findings paint an embodied picture of music-listening, where not just what you see, hear and know about the music shapes the experience, but also the way you physically interact with it matters as well. This is true in the more common participatory musical cultures around the world, where everyone tends to join in the music-making, but also in the less common presentational cultures, where circumstances seem to call for stationary, passive listening. Even in these contexts, when and how a person moves can shape what they hear.
The musical vocabularies and styles that people hear while growing up can shape the structures and expressive elements they are capable of hearing in a new piece. For example, people show better recognition memory and different emotional responses to new music composed in a culturally familiar style, as compared with new music from an unfamiliar culture. But it’s not just previous musical exposure that shapes their perceptual system: the linguistic soundscape within which a person is raised also reconfigures how they orient to music.
In languages such as English, the pitch at which a word is pronounced doesn’t influence its dictionary meaning. Motorcycle means a two-wheeled vehicle with an engine whether I say it in a really high or really low voice. But other languages, such as Mandarin Chinese and Thai, are tone languages: when Chinese speakers say ma with a high, stable pitch it means ‘mother’, but if they say it with a pitch that starts high, declines, then goes back up again, it means ‘horse’. The centrality of pitch to basic definitional content in these languages means that tone-language speakers produce and attend to pitch differently than non-tone-language speakers, day in and day out over the course of years. This cumulative sonic environment tunes the auditory system in ways that alter basic aspects of music perception. Speakers of tone languages, for example, detect and repeat musical melodies and pitch relationships more accurately than non-tone language speakers.
Culture and experience can change how music is heard, not just how people derive meaning from it
The psychologist Diana Deutsch at the University of California, San Diego concocted tritones (two pitches separated by half an octave) using digitally manipulated tones of ambiguous pitch height. People heard these tritones as ascending or descending (the first note lower or higher than the second) depending on the linguistic background in which they had been raised. Speakers of English who grew up in California tended to hear a particular tritone as ascending, but English speakers raised in the south of England tended to hear it as descending. Chinese listeners raised in villages with different dialects showed similar differences. A striking characteristic of this ‘tritone paradox’ is that listeners who hear the interval as ascending generally experience this upward motion as part of the perception, and have trouble imagining what it would be like to experience it the other way, and vice versa for listeners who hear it as descending. The effect influences what feels like the raw perception of the sound, not some interpretation layered on later. Culture and experience can change how music is heard, not just how people derive meaning from it.
Music’s interdependence on so many diverse capacities likely underlies some of its beneficial and therapeutic applications. As the late neurologist Oliver Sacks showed in Musicophilia (2007), when a person with dementia listens to music from her adolescence, she can become engaged and responsive, revealing the extent to which these tunes carry robust autobiographical memories.
Music cannot be conceptualised as a straightforwardly acoustic phenomenon. It is a deeply culturally embedded, multimodal experience. At a moment in history when neuroscience enjoys almost magical authority, it is instructive to be reminded that the path from sound to perception weaves through imagery, memories, stories, movement and words. Lyrics aside, the power of Cohen’s Hallelujah doesn’t stem directly from the fourth, the fifth, or even the minor fall or the major lift. Contemporary experiences of the song tend to be coloured by exposure to myriad cover versions, and their prominent use in movies such as Shrek. The sound might carry images of an adorable green ogre or of a wizened man from Montreal, or feelings experienced at a concert decades ago.
Despite sometimes being thought about as an abstract art form, akin to the world of numbers and mathematics, music carries with it and is shaped by nearly all other aspects of human experience: how we speak and move, what we see and know. Its immense power to sweep people up into its sound relies fundamentally on these tight linkages between hearing and our myriad other ways of sensing and knowing.
Syndicate this Essay
MusicCognition & IntelligenceNeuroscienceAll topics →
Elizabeth Hellmuth Margulis
is director of the music cognition lab at the University of Arkansas, a trained concert pianist, and the author of On Repeat: How Music Plays the Mind (2013).
The 55 Bar in Greenwich Village, with its bulging ceiling tiles and strings of fairy lights taped haphazardly to the walls, looks more like the clubhouse of a rural Irish sports team than a New York City jazz venue. Yet some of the musical experiences I’ve had in that dingy basement have bordered on the otherworldly. When I’m pinned to the back of my seat by the mind-warping rhythms of a drummer, or the harmonic ingenuity of an improvising guitarist, I often have the feeling that my body ‘gets’ things in a way my brain can’t. I find myself physically responding to nuances in the musical texture that have been and gone before I have time to formulate thoughts about them. I can speculate to some extent about what I’ve heard after the fact – that snare hit was perhaps a shade early; that cadence resolved just a fraction too late – but in the moment, I can’t quite articulate what it is that I’m reacting to. My grasp on what I’m hearing doesn’t seem cognitive. It seems visceral.
But talk of ‘visceral, non-cognitive grasping’ sounds hopelessly vague from a philosophical standpoint. In philosophy, it’s common to describe the mind as a kind of machine that operates on a set of representations, which serve as proxies for worldly states of affairs, and get recombined ‘offline’ in a manner that’s not dictated by what’s happening in the immediate environment. So if you can’t consciously represent the finer details of a guitar solo, the way is surely barred to having any grasp of its nuances. Claiming that you have a ‘merely visceral’ grasp of music really amounts to saying that you don’t understand it at all. Right?
Humans do, of course, represent features of the world, and perform mental operations on that information. We owe many of our most striking successes as a species to doing just that: it’s how we built aqueducts, and steam engines, and computers. But just as often, we allow ourselves to be borne along by the currents of what’s swirling around us without abstracting away from it. Getting swept up in a musical performance is just one among a whole host of familiar activities that seem less about computing information, and more about feeling our way as we go: selecting an outfit that’s chic without being fussy, avoiding collisions with other pedestrians on the pavement, or adding just a pinch of salt to the casserole. If we sometimes live in the world in a thoughtful and considered way, we go with the flow a lot, too.
I think it’s a mistake to dismiss these sorts of experiences as ‘mindless’, or the notion of a merely visceral grasp of something as oxymoronic. Instead, I think that the lived reality of music puts pressure on philosophers to broaden their conception of what the mind is, how it works, and to embrace the diversity of ways in which we can begin to grapple with the world around us.
Subscribe to Aeon’s Newsletter
Discussions about how we gain access to reality usually begin with perception. Yet philosophers of perception tend to be almost exclusively concerned with vision. Music, as a consequence, seldom makes it onto the agenda. This comes at a cost: not only has the immediate experience of events attracted far less philosophical attention than the experience of objects, but the role of the body in our experience of movement and change has been sidelined, too.
Now, the world contains many things that we can’t perceive. I am unlikely to find a square root in my sock drawer, or to spot the categorical imperative lurking behind the couch. I can, however, perceive concrete things, and work out their approximate size, shape and colour just by paying attention to them. I can also perceive events occurring around me, and get a rough idea of their duration and how they relate to each other in time. I hear that the knock at the door came just before the cat leapt off the couch, and I have a sense of how long it took for the cat to sidle out of the room.
Both objects and events have a structure. My desk lamp has parts – a square base, a hinged ‘neck’, a circular shade – which are related to each other in space in a particular way: the base is connected to the neck, which is connected to the shade, and so on. Similarly, events have temporal structure: they have parts that are related to each other in time (the knock at the door, for instance, is composed of three sequential raps roughly equivalent in duration). But events and objects differ in an important respect. If I want to examine the parts of my lamp, or figure out how exactly they fit together in space, I can squint at it, pick it up, or turn it around. But while the lamp obligingly submits to my investigations, events extend me no such courtesy. The ‘happenings’ in my environment are constantly sliding into the past, out of reach. And though I could chase after the lamp, were it suddenly to gather up its cable and flee, I can’t pursue a fleeting event to ‘get a good look’ at it.
You can experience a waltz as graceful without any idea that its grace arises via a distinctive temporal patterning
We can discern some coarse-grained properties of a drum-beat just by listening – the kick happens first, then a snare, with a hi-hat somewhere in the middle – but figuring out its precise temporal structure is much less straightforward. Was that snare slightly early, or was it slightly late? It’s like glimpsing the outlines of an intricate architectural filigree through a thick fog, without being able to clear the air. But even if fine-grained temporal structure is opaque to perception, it might not be entirely beyond our ken – because, fortunately, action is more sensitive to temporal detail than perception.
We can move our bodies in response to temporal details too fine for us to consciously experience. In a study published in 2000, the psychologist Bruno Repp at Yale University asked subjects to tap along with a rhythmic sequence of tones, delaying all the tones after a particular point in the sequence by the same tiny amount. He observed that subjects’ tapping patterns compensated rapidly for the change, despite the fact that they were unaware of it. Outside the lab, live performances often feature changes in temporal structure, such as small tempo increases, that even the players producing the sounds fail to notice even while they’re playing along.
Sometimes the temporal detail we’re tracking physically does manifest in our conscious awareness, in the guise of a characteristic ‘feel’. The beats played by the drummer Questlove on the album Voodoo (2000), by American songwriter and producer D’Angelo, have a distinctive temporal structure – the precise details of which we might fail to represent, but can experience as a kind of characteristic looseness, or trippiness. Likewise, you can experience a Viennese waltz as graceful without having any idea that its grace arises from its distinctive temporal patterning, where the first beat is lengthened, the second shortened, and the third given the barest of accents.
The subliminal tracking of temporal structure, which hovers around the fringes of conscious awareness, doesn’t just happen when we listen to and play music. It’s a core component of how we comprehend speech, too. In fact, everyday speech is saturated with fine-tuned musical features that are crucial to making ourselves understood. Say the following two sentences aloud:
I was happy.
I was happy.
You probably lengthened the word ‘was’ the second time around. By doing so, you managed not only to convey ‘I was happy in the past,’ but also to imply ‘… though not any more.’ Detecting temporal structure in sound is key to grasping what other people mean, and also to conveying meaning ourselves.
But the success of a face-to-face conversation involves more than just processing an interlocutor’s utterance and emitting a series of comprehensible noises. Consider the following everyday exchange:
Good morning! How are you doing?
I’m very well, thanks. How are you?
I’m doing great. It’s a beautiful day out there.
It certainly is!
Now imagine this brief conversation happening again, but this time with each utterance beginning half a second before the previous one has finished. Or imagine each utterance happening 10 seconds after the previous one. It’s not just what you say that matters, or how you say it: the timing and rhythm matters, too.
A 2009 study by the sociologist Tanya Stivers at the University of California, Los Angeles, and her colleagues found that it’s the norm in most languages and cultures to avoid overlaps and to take turns in conversation, with some local variation. Delivering an affirmative response to a question within 36 milliseconds is judged ‘on-time’ in Japan, while in Denmark you can take 203 milliseconds and still be judged timely. Even though the ‘huge’ inter-turn Nordic silences observed by non-Nordic anthropologists aren’t all that large, such comments reveal that deviations from one’s own acculturated norms are seen as highly salient. In other words, what is experienced as a ‘delay’ – and thus as an indicator of dissent, since confirmations are generally delivered faster than opposing statements – differs across cultures. A congenial Danish tourist in Japan might well be puzzled to find herself taken for something of a contrarian.
Musical rhythms call for conscious movement in a way that visual, tactile and even spoken rhythms do not
Rhythmic turn-taking is not the only musical aspect of speech. Greetings and farewells are ordinarily delivered in the upper part of the vocal register (hence why it’s offputting when someone flatly intones: ‘Goodbye’). The difference between expressing sincerity or sarcasm – ‘Well, isn’t that just great!’ – boils down to differences of pitch, syllable duration and articulation. And it’s hard to address a small baby without finding oneself using hugely exaggerated pitch contours, not to mention repeating words ad nauseum (an instinct for which we shouldn’t punish ourselves, however, since there’s evidence that repetition and over-the-top prosodic features aid a child’s linguistic learning). The most stirring parts of political speeches often involve repetition, and sometimes even embryonic rhythms (‘we shall fight on the beaches, we shall fight on the landing grounds’). As Cicero put it in his History of Famous Orators, the would-be master of rhetoric needs to realise that ‘even in Speaking, there may be a concealed kind of music’.
There might also be a concealed kind of movement. In a 1970 study, the psychologist Adam Kendon noticed that when a speaker singles out an individual within a group, the person being addressed begins to move and nod. Kendon speculated that the addressee thereby ‘differentiates himself from the others present, and at the same time he heightens the bond that is being established between him and the speaker’. The addressee also tended to move in time with emergent rhythms in the utterances of the speaker (an observation that recent studies have confirmed). Kendon hypothesised that the coordination of movement between speaker and listener might enable the listener to time his own entry as a speaker, much as a musician might begin to move conspicuously with the music before she enters with her part.
Movement clearly plays a role in speech, yet its role is importantly different from the role it plays in music. If you were to draw your interlocutor’s attention to the ways in which you were timing your movements, everyone would start feeling a bit awkward and the whole communicative project would derail. But attending to musical movement does not destroy its effect. If anything, it heightens it: dancing becomes more enjoyable the more you pay attention to your movements, and the movement of those around you. Musical rhythms call for conscious (as opposed to unconscious) movement in a way that visual, tactile and even spoken rhythms do not: we seem not only to hear musical beats, but to feel them, too. So just how is it possible to feel a sound in the first place?
It’s 1665. The pressing need to find a reliable way of measuring longitude at sea has led to an arms race among astronomers and mathematicians, who are scrambling to find an accurate method of measuring duration. The Dutch astronomer Christiaan Huygens has recently been catapulted into pole position by the accuracy of his new invention, the pendulum clock.
On 22 February, Huygens writes to R F de Sluse to tell him about a curious phenomenon he has observed in his workshop. Having hung two of his clocks from a common wooden beam placed across the backs of two chairs, Huygens had gone about his business before returning to find the clocks showing an ‘odd sympathy’. The pendula had synchronised. Initially baffled, Huygens eventually realised that each clock was producing small vibrations in the wooden beam, and that it was the interaction of these two patterns of vibration that was responsible for the sympathetic movement.
The spontaneous synchronisation of oscillating systems has since become known as ‘entrainment’, and it has been observed in a vast array of physical and biological systems – from the illumination patterns of fireflies to the wingbeats of free-flying barnacle geese to the tendency of an applauding audience to start clapping in synchrony.
Movement to musical rhythms used to be cast in terms of computation: the listener extracts information from musical sounds, forms a temporal representation and transforms that into an action signal. But more recently, psychologists have begun to model rhythmic musical movement as a process of entrainment, whereby oscillations inside the listener become synchronised with rhythmic cues in the environment in a relatively automatic, spontaneous way. No intervening computations are required: the existence of natural resonances between brain, body and world is enough.
If we are the only speaking apes, we would appear to be the only dancing apes, too
Appealing to little oscillators inside us might seem worryingly occult until one recalls that the brain isn’t just an inert chunk of meat. The activity of neurons can give rise to macroscopic patterns as a consequence of how they’re connected to each other – in the same sort of way that individual spectators at a football match, sensitive to the movement of their neighbours, can collectively make a Mexican wave.
Studies have shown that neuronal groups in our brains do, indeed, entrain to rhythmic stimuli. Rhythm-processing involves increased coupling between auditory and premotor cortex, a part of the brain involved in planning and executing bodily movement. It also recruits the basal ganglia, a group of structures deep in the brain involved in motor control, action selection and learning. Intriguingly, even when subjects are instructed not to move in response to what they hear, the basal ganglia is recruited in the processing of auditory beats – though not when they are presented with regular visual rhythms. Patients with Parkinson’s disease, who suffer from impaired basal ganglia function, show deficits in duration-discrimination and the ability to synchronise their finger taps with auditory rhythms.
It seems that moving in response to temporal structure is not something we have to ‘work out’ how to do. Detecting and responding to temporal patterns, in music and elsewhere, is more likely a matter of allowing oneself to be borne along by the natural, spontaneous resonances that already exist between our bodies, our brains and the temporal contours of the sounding world.
Most creatures, even our nearest primate relatives, don’t seem to experience musical beats in quite the same movement-involving way that we do. If we are the only speaking apes, we would appear to be the only dancing apes, too. But we shouldn’t be too hasty in our self-congratulation. Entrainment to other rhythmic stimuli in the environment is ubiquitous in the animal kingdom – and the uses to which our fellow beasts can put environmental rhythms is impressive indeed.
Where do birds go in the winter months? The Ancient Greeks hypothesised that they hibernated in holes in the ground, or transformed into other species of birds; other civilisations thought that they became barnacles, or concealed themselves at the bottoms of lakes. Such bizarre theories are, in a way, less implausible than what we now know to be true: that creatures weighing less than a box of matches can fly non-stop for thousands of miles over land and sea with no navigational aids, consuming their own bodies as fuel, calculating their route with such precision that they often end up landing not only in the same tree, but on the same twig as they did the year before. And a few months later, they do it all again in reverse.
So-called ‘calendar birds’ migrate at the same time every year, regardless of weather. Magnetic sensitivity and the sense of smell are thought to be instrumental to the success of these voyages. But scientists also think that migrating birds are highly sensitive to time: both to elapsed duration, and also to the presence of circannual, or yearly, environmental rhythms. The ability of these birds to ‘know’ exactly when to depart is thought to rely on entrainment to patterns in the environment that repeat annually, such as changes in the light-dark cycle. Once they get to their winter breeding grounds, where the light-dark cycle is reversed, an internal ‘clock’ is thought to keep track of how much time has elapsed since their departure; a cascade of biological events, such as fat deposit and even the shrivelling of internal organs, begins in the weeks before it’s time to return home. Once the voyage is underway, in either direction, entrainment is what allows the bird to keep track of regularities in the Earth’s magnetic field, and the ‘clock’ keeps count of how long it has been flying on each ‘bearing’.
The tiny Northern wheatear doesn’t travel the 15,000 km from Alaska to southern Africa twice a year by consciously representing the route, or the environmental patterns by which it is calibrated. Maybe the bird blindly implements the instructions of its biological sat-nav like a computer executing code: there might be ‘nothing it’s like’ for the bird to be sensitive to circannual rhythms. However, the contrary is also possible: perhaps at least some of those environmental patterns ‘feel’ a certain way to the bird, much as particular rhythmic patterns feel ‘trippy’ to us despite our failure to represent their precise structure. In 1851, the English writer Henry Mayhew noted that, as the season for migration approaches, ‘the caged nightingale shows symptoms of great uneasiness, dashing himself against the wires of his cage or his aviary, and sometimes dying in a few days.’ It is difficult to read such accounts and not sense what it is like for a bird to feel the pull of the voyage.
Entrainment provides a powerful theoretical tool for exploring how we manage to resonate with the world, and each other, in real time. It offers an embryonic account of how we can act astutely even when there’s no time for conscious thought. And while many of the entrainment processes that regulate the functioning of our brains and bodies never make it into awareness, some of them – like viscerally ‘getting’ a guitar solo – arguably do.
Our conscious experience of time is philosophically puzzling. On the one hand, it’s intuitive to suppose that we perceive only what’s happening rightnow. But on the other, we seem to have immediate perceptual experiences of motion and change: I don’t need to infer from a series of ‘still’ impressions of your hand that it is waving, or work out a connection between isolated tones in order to hear a melody. These intuitions seem to contradict each other: how can I perceive motion and change if I am only really conscious of what’s occurring now? We face a choice: either we don’t really perceive motion and change, or the now of our perception encompasses more than the present instant – each of which seems problematic in its own way. Philosophers such as Franz Brentano and Edmund Husserl, as well as a host of more recent commentators, have debated how best to solve the dilemma.
But the experience of time involves more than just the perception of events occurring at a distance from us. We also experience time by instigating events through our actions, as well as encountering the actions of others. To relish the flow of a chat with a friend, or to feel the groove of a beat, is to have a distinctive kind of temporal experience where the observation of time becomes entwined with how one inhabits it – but in each case, the experience is less a matter of representing temporal structure than of entraining to it, resonating with it.
Reasoning is often a matter of being ‘struck’ by a thought, of having one’s intellect set in motion by ideas
Is resonance without representation always a mindless affair? Not necessarily. Reason wasn’t always thought of in terms of representation, for one thing. In 1769, the French philosopher Denis Diderot offered the following characterisation of the thinker, in his dialogue with his friend Jean Le Rond d’Alembert:
The sensitive vibrating string oscillates and results for a long time after one has plucked it. It’s this oscillation, this sort of inevitable resonance, that holds the present object, while our understanding is busy with the quality which is appropriate to it. But vibrating strings have yet another property – to make other strings quiver. And thus the first idea recalls a second, and those two a third, then all three a fourth, and so it goes, without our being able to set a limit to the ideas that are aroused and linked in a philosopher who meditates or who listens to himself in silence and darkness.
This is a far cry from the modern characterisation of the philosopher as one who contemplates propositions from a position of detachment, in order to reflect on the world without being moved by it. For Diderot, at least, the philosopher must listen keenly, and attune himself to the patterns that he seeks to understand. But even cursory introspection reveals that the processes of reason themselves are saturated with resonance. Reasoning is often a matter of being ‘struck’ by a thought, of having one’s intellect set in motion by ideas. We say that a speaker’s message ‘resonated’ with us when we not only comprehend it, but find it compelling. Far from being at odds with reflection, then, resonance might be its close companion.
Human attempts at making sense of the world often involve representing, calculating and deliberating. This isn’t the kind of thing that typically goes on in the 55 Bar, nor is it necessarily happening in the Lutheran church just down the block, or on a muddy football pitch in a remote Irish village. But gathering to make music, play games or engage in religious worship are far from being mindless activities. And making sense of the world is not necessarily just a matter of representing it.
Music is a reminder to philosophers of mind that perceptual experience isn’t exhausted by vision. It prompts the recognition that conscious experience is dynamic, encompassing motion and change. But music also nudges philosophers toward a conception of the mind as more than just a very sophisticated calculator. If humans are representing machines, we are resonant bodies, too.
Syndicate this Essay
MusicCognition & IntelligencePhilosophy of MindAll topics →
is a philosopher, musician and writer-at-large whose work has appeared in TheGuardian, ThePhilosopher's Magazine and Medium's subscription programme. She holds a PhD in musicology from the University of Cambridge, and is currently working on a second PhD in philosophy at NYU. Her research interests include the philosophy of mind, cognitive science and aesthetics.