Bird Song, Primate Calls, and the Evolution of Language

Society for Anthropological Sciences, annual conference, Pittsburgh, PA, 2015



Language Evolution and Animal Communication

  1. N. Anderson

Dept. of Anthropology

University of California, Riverside





Speculations on the origins of language were so rife already in 1865 that the Société Linguistique de Paris banned discussion of them.  The Philological Society of London followed suit in 1873 (Luef and Pika 2015).  It did no good.  The tradition continues, and I will now add to it.

  1. Tecumseh Fitch’s masterful review The Evolution of Language (2010; Fitch et al. 2005) leaves very little to say on that topic. He covers an amazing range of topics, from physiology to formal language and from evolutionary theory to the sign-language of the deaf, with brilliant insight, impeccable logic, and total control of the material.  Also, the book is clearly written, and spares us the dreadful musings on “consciousness” and “mind” that ruin so much of the cognitive-evolutionary literature.  All in all, this is one of the great books of social science, and deserves to become a classic.

However, he does leave something.  He knows so much more than I do about most fields—anatomy, neurology, formal linguistics, and linguistic theory, for instance—that I would not dream of saying anything, but in a few areas I may have a tiny bit to add, primarily in the field of animal communication.  Like Fitch, I started out in that area of study, but I drifted into ecological anthropology while he (later) drifted into linguistics.

All communication systems in the animal world are used primarily for social ends.  Usually the most complex systems are involved in mating and in raising young.  Often they are also involved in maintaining the social group and relationships within it, or in cooperative hunting, or nesting.  Lifelong association with dogs and coyotes teaches me that animal communication is not all about territory and mate-getting (as most older books allege).  Most, in these species, is about social bonding, and negotiating social systems.

Bird song is one area I know from research and observation.  The songs of birds are multifunctional:  they declare territory, attract mates, hold mates, keep up contact with young, warn rivals, signal to friendly neighbors that their old friend is still singing, warn predators that the singer is alert and ready to chase them off, and probably several other functions.  Signaling to the friendly neighbor community is a newly discovered function that turns out to be extremely important, but was totally missed until recently because biologists assumed birds were too dumb, individualist, combative, etc. to do that.  (See Kroodsma 2005 for both a good introduction to bird song and a sadly, and inaccurately, limited take on their meanings; see also Marler and Slabbekorn, a major source for Fitch.)


Clearing Away Underbrush: What We Can and Can’t Learn from Animal Communication


One of the biggest problems with studies of the evolution of language is that rather few of the people who deal in that field know the details of recent findings on animal communication.  Admittedly, it is a rather arcane field.  The increasing tendency of scholarship to take place in “silos” keeps linguists from finding out about the bird song or wolf literature.

Another problem is the sort of arid speculation in a vacuum that the learned societies attempted to ban.  A recent article in Current Anthropology (Scott-Phillips 2015), with full CA* commentary, manages to spend 24 pages without one single item of data, either in the main text or the commentaries.  It is all on a high philosophic level.  Unsurprisingly, the authors show no detectable understanding of animal communication.  I found little of remark in this work.

Most of the classic barriers that were supposed to separate human language from animal communication have broken down long ago. The classic Cartesian view that since animals cannot talk they are mere machines, without souls or “reason” (Descartes 1999 [1637]), is of course mere Catholic religious dogma.  It has absolutely no scientific excuse.  Modern pseudo-scientific repetitions of it, e.g. “Lloyd Morgan’s canon,” are long disproved.  (For a superb review of animal sociality, showing how it can all be explained by standard Darwinian selection without recourse to group selection, see Bourke 2011.)

Animal sounds are structured and often show duality of patterning.  The smarter animals—the more highly social, highly encephalized birds and mammals—clearly intend to communicate, as opposed to producing mere “instinctive sounds” or “conditioned reflexes.”

When a bird learns a song to communicate its territorial boundary or its desire for a mate, it is clearly engaging in symbolic communication: there is no relationship per se between a song and a territory or a mate.  The song is not iconic or indexical; it is a symbol.  There is obviously some instinctive/inborn tendency to sing, but humans have an inborn tendency to talk.  Singing and talking are used deliberately to communicate various meanings that have nothing to do with the vocal sounds as such.


Bird Song as Model?


Birds learn their songs, and have a similar FOXp2 gene to the human one associated with language learning capability, as Fitch notes (and see Haesler 2007).  He also knows that birds need to  hear their own songs to perfect them (Keller et al. 2009).  Since Fitch wrote, even more human-like details of hemisphere specialization, song development, and other bird expression have surfaced (Moorman et al. 2012).  And birds, like primates, have mirror neurons (Prather et al. 2008).

Learning of songs and calls is widespread.  (What follows is summarized from a lifetime of reading and field work; for good up-to-date summaries of the science, see Kroodsma 2005; Marler and Slabbekorn 2004.)   Most bird singers, and all good singers, are members of one group, the suborder Oscines within the order Passeriformes.  It is clear how song evolved: the suboscines in that order give purely, or almost purely, instinctual calls that are like songs in that they loudly announce territory and attract mates; these in turn evolved from similar, even simpler, calls common to essentially all bird groups.  Song—partially learned—independently evolved in hummingbirds and some other groups.  There is little doubt how it evolved: in small birds with tightly-packed territories, something of this sort is needed to stake out territory and attract mates to same and to allow, simultaneously, neighbor recognition—a very important function that is critical to song learning.

The most dramatic cases are of those birds that not only imitate other sounds, but weave these sounds into their songs, changing the sounds progressively to fit the song.  Mockingbirds are the familiar example here in southern California.  They imitate neighboring birds such as kingbirds, woodpeckers, killdeer, and orioles, and once they have learned a harsh note they will work at transforming it to make it more songlike.  (This spares perceptive birdwatchers a lot of errors—one learns to tell imitations from the actual kingbirds, etc.)  Many other imitators do the same, as I can testify from experience with nightingales, goldfinches, and some others.

Mockingbirds and thrashers (mockingbirds are actually a kind of thrasher) can learn far more than 1000 song phrases.  Some obsessive birdwatcher counted, and found the champion in his count was the brown thrasher, with over 2000 known song phrases (Botero et al. 2009; Kroodsma 2005).  Mockingbirds are close.  It turned out that the farther north a species in the mockingbird and thrasher family bred, the more phrases it knew (Botero et al. 2009).  The investigators thought that migration required health and strength, and the females would assess a male’s strength by his song knowledge.  Another possibility is that the migrants have to regain and retake territory every year (instead of holding it year-round) and thus have to work harder.  These are not mutually exclusive; I dare say both are right.  Mockingbird song is elaborated from the more simple phrases of the wren group of birds.  Most wrens sing songs of one to four or five notes, with many variants of each song.  Combining the variants into one long song, and fantastically increasing the number of them, gives us mimid song.  Some wrens have paralleled mimids in this, or have developed complexity in another direction: elaborate duets (see below).

Much less dramatic, but much more widespread, is the tendency of songbirds to have a single, simple template for their songs, but to learn the refinements and modifications of that from their parents and other adults in their neighborhoods.  Apparently all songbirds sing many different songs (per bird), and all make their own personal songs unique.  However, they follow the style of the neighborhood.  Song dialects were first studied in chaffinches, but made really famous by studies of white-crowned sparrows, because they were famous for such dialects and because they were easily available on the UC Berkeley campus when Peter Marler (who started out with the chaffinches) set out to study song dialects.  Often, they go on to invent their own songs, patterned closely after the adults’ and thus producing a regional dialect with countless variations.

Bird song structure is phrasal.  Typically, a bird song is made up of phrases ranging from two to 12 notes, usually around four to six.  A song may be one phrase; the magnificent song of the canyon wren is one grand, swooping descent of the scale.  Most birds sing two or three phrases.  The mockingbird and its kin sing an open-ended number—in a day a mockingbird can work through dozens of phrases.

Margaret Morse Nice (1937-1943), in her classic work on the song sparrow that was the first and perhaps still the best monograph on a single bird species’ behavior, noted that a given song sparrow will have many songs, which are similar to neighbors’ songs but which are original to the individual.  (Nice apparently never held an academic job, in spite of being one of the greatest ornithologists of her or any other time—such was the fate of women in those days.)  He (males do the singing) will sing one song several times, switch to another, sing it several times, and so on all day—working through a monumental number of songs. Each song is made up of three or four phrases, and these are clearly patterned and organized in an overall way—not just Markov chaining but a very rudimentary sort of “grammar,” a type of Chomsky’s “phrase structure grammar.”  I have noted that each song is somewhat related to the past one; the birds evidently pattern their song sequences.

Most birds learn the songs of their natal neighborhood, but at least one, the South Island Saddleback of New Zealand, disperses and then learns the songs of the neighborhood in which it finally settles down.  (See Marler and Slabbekoorn 2004.  Alas, there is now little chance to study this fascinating bird, because its vulnerability to introduced rats, cats and weasels—especially as nest robbers—has led to its extinction except on a couple of tiny predator-free islets.)  Probably other birds do this also.

Another complex behavior carried out by birds is song duetting.  Pairs of plain-tailed wrens of South America, for instance, coordinate very closely their long, intricate songs.  This takes appreciable thinking.  It involves timing and song type coordination that go far beyond anything explicable by simple instinct or conditioning (Fortune et al. 2011).  Many other birds duet, including several closely related wren species.  They all probably have the same conscious planning behind it, but this has not been studied.

Songs may vary systematically with ecology.  Blackcaps—small European warblers with notable songs—have some migratory populations and some permanent-resident ones.  The former have more elaborate courting songs, but less elaborate male-rivalry songs, than stationary ones; this flags the quite different mating strategies of the two ecotypes (Collins 2009).

The complexity of bird song was greatly underestimated until better field studies, with modern recording equipment, appeared.  We now know much more about alternation of song types, song learning, and phrase construction.  Every new study reveals new complexity.

Song, however, does not exhaust bird communication.  Birds display visually, build bowers and nests, manipulate objects, engage in courtship feeding, and otherwise communicate by visual channels.  They also use touch, as in the well-known “billing and cooing” of mating pigeons and doves.  Some birds may even communicate by scent, though most birds have little sense of smell.

The most complex communication known in a nonhuman species is not found in primates but in the Satin Bowerbird of Australia (Johnsgard 1994:215-216).  Males build huge, complex bowers, which they decorate with all kinds of objects, especially bright blue ones that highlight the blue sheen of the black males.  In the wild, they use flowers and berries, but today they have taken to seeking out bits of blue glass, plastic, and painted materials.  One famous photograph, much reproduced, shows a bower beautifully decorated with a whole clotheslinefull of blue plastic clothespins.  The human family must have wondered why their laundry was all on the ground.  This, however, is only the beginning: the bird not only artfully arranges all these objects into patterns pointing to its own display place, it even paints the bower.  Using a small wad of vegetable matter for a brush, it applies colored clay or similar material to the grass and twigs of the bower.  The whole construction can imvolve many cubic metres of material.  Within this bower, at its focal point, the male displays, singing a complex song that is highly imitative.  As with the mockingbird, the male satin bowerbird weaves any available sound into a full repertoire of phrases.  He also engages in acrobatics on his perch.

The reason for all this is that the bird that builds the best bower, and does the best display, gets the most attention from the females.  Bowerbirds are lek species: the males display competitively to many females, and the most attractive male gets by far the most sex.  So runaway selection happens.  In bowerbirds, the learning component is so large that only a long-lived male can compete.  A good bower builder is thought to need ten to twelve years or so of experience to be competitive.  Thus, the females, in choosing the best builder and singer, are choosing a bird that has succeeded in dodging cats, owls, hawks, and other enemies for years—even while showing off in a most conspicuous way.  The females presumably do not think about this; they think only of who has the most impressive show.  But, underneath, natural selection has led to the oldest and smartest leaving the most descendants.

This is truly astonishing in that it is the only animal communication known to me that goes beyond simple phrase structure grammar, or that goes beyond one or at most two levels of recursion.  The bowerbird is combining learned sounds (unnatural to him) into songs, combining found objects into a bower, and and then combining gestures into a display—and combining all these into one bravura performance.  Other species of bowerbird are only somewhat less complex.  No other group of birds comes close, but many do build extra nests, modify twigs, clear areas of ground, or otherwise construct managed spaces.

Mimicry can be deployed in positively Machiavellian ways.  The forked-tailed drongo (Dicrurus assimilis), an African bird, can mimic almost anything, and has learned that if it mimics various alarm calls it can scare birds away from their food and steal the food (Flower et al. 2014).  If it mimicked only one alarm call, the other birds would soon learn, and not fall for the trick (birds learn such things easily).  This is, I think, the first time anyone has shown that any nonhuman animal can deliberately learn, choose, and invoke other species’ alarm calls for a particular functional reason.  It is awfully hard to believe that this is not conscious, wilful deception.  It is deployed situationally and appropriately.

Song learning can get amazingly complex: nest-parasitizing African indigobirds learn to sing the song of their foster parents, then attract members of the opposite sex with that song, making sure that specific lines of parasitizing go on and on (Payne and Sorenson 2006).

All this is clearly important for understanding the development of structural complexity and use of symbol in communication.  Unfortunately, it does not get us far in understanding the functional side of human language, and thus the whole Darwinian explanation of language evolution.

The reason is that bird song, complex though it may be, carries the simplest and most unchanging of messages.  The old idea that it is about nothing more than territory and mating is long disproved, but it does not really do much more.  The only demonstrated uses are:

(1) things they apparently do with some awareness:

–creating, marking and holding territory

–attracting and holding a mate (many birds have special pair-bonding songs; others sing to lure females for extra-pair copulations)

–sheer enjoyment and good spirits (birds clearly love to sing, or else they wouldn’t do it)

–recognizing the neighbors and being friendly with them, and, in consequence,

–knowing when a stranger comes into the neighborhood, to be attacked or at least evaluated for attack

–express high energy levels and personal excitement

–greet the new day (for whatever reason, birds usually do their main singing at dawn; sometimes also at dusk or in the dawn-like situation of clearing rainstorm clouds)

(2) things that were presumably evolutionary factors in the development of song:

–serving notice that the bird in question is on territory, healthy, and willing to fight or love

–signaling to the females (assuming male bird singing, which is the usual case) that the male in question is healthy, strong, intelligent, a survivor, and thus a good Darwinian match

–possibly, signaling good conditions: good climate, safe environment, food available, and the like.  This is by no means certain.

The first seven of those functions are clearly conscious and deliberate.  The male sings when those contingencies arise and does not choose to sing otherwise.  The last three, however, are probably “instinctive” in that they are not consciously communicated.

This is a fairly impressive list of messages for a brain weighing a few grams, but the problem is that it never changes.  The mockingbird and the brown thrasher, with their thousands of phrases, say only this—over and over.  Even the satin bowerbird says nothing more than this.  All these birds love to learn and try out new sounds, and work to get them to fit their singing style.  They obviously become quite creative and involved in the process.  But they use them for the same tired old purposes.  This is expectable from brain size: the song center makes up a fourth or more of the brain of a mockingbird or other good singer, leaving very little space for any other type of thinking.  This is the biggest difference between bird and human communication.  We can write novels like War and Peace and books like The Decline and Fall of the Roman Empire.  Birds can manage “Judy, will you marry me?” and “Hi gang!” but not a lot more.


Bird Song Not as Model


The real problem with bird song as a model for human language, however, is a very different one, not heretofore noted (you are hearing it for the first time!):  the most social songbirds do not sing.  Several species of songbirds have evolved away from singing.  One can be sure their ancestors were singing because these nonsinging species are standouts within their family or even within their genus, and they are clearly the more recently evolved and specialized members of said family or genus.  Also, some intelligent birds that can learn human words are not songbirds at all (e.g. parrots; Pepperberg 1999).

The extreme case is that of the crows and ravens.  Jays, the more ancestral forms in the family Corvidae, sing (at least the ones I know do).  Crows and ravens are by far the most complexly social of birds, and probably the most intelligent (see Heinrich and Bugnar 2007; Marzluff and Angel 2005, 2012; Taylor et al. 2012).  They occur in flocks that may number hundreds in some species (rooks, Sinaloan crows) or even thousands (American crows, Mexican crows).  These flocks have a structure: they consist of nuclear families associated into extended families that extend outward into the whole population.  They are vast kingroups.  Mates in-marry and lone birds can join, but, basically, flocks are stable over very long time frames, and exist as stable descent groups.  Communication involves very complex displays and vocal sounds, but no songs.  (This paragraph is largely from my own observation and the work of Russell Balda and John Marzluff on pinyon jays—which are small crows, not jays—and crows; Marzluff and Balda 1992 and pers. comm.)

Other notable nonsingers include the western bluebird, whose congeners the eastern bluebird and mountain bluebird do sing; the obvious difference is that western bluebirds nest in loose colonies and flock in winter, while the other two are more solitary.  Cedar waxwings do not sing and are always in flocks, while their close relatives the phainopeplas are less flock-oriented and have complex songs.  Among marsh blackbirds, there is a continuum from the fairly colonial but still territorial redwing—a good singer—to the more colonial tricolored and yellow-headed blackbirds, which have much less impressive and complex songs, but which have smaller territories if they have territories at all.  Other social species with reduced songs include chickadees, the smaller nuthatches, and the more colonial swallows.  There are many songbirds that nest fairly colonially and still sing, but the converse is not true: I can think of no non-singing songbirds that are not colonial or flock-oriented.

Conversely, the truly great singers are all fiercely territorial.  They live at fairly high population densities—this seems critical; they need to be in earshot of several rivalds—but defend sizable territories against all comers.  This does not guarantee a good song (the California towhee is famously territorial and famously rudimentary as a singer) but it does seem that only strong territory-defenders have good songs. The conclusion is inescapable that songs are first and foremost about holding territory, and competing to see who can sing the best and thus hold territory the best.  We can therefore learn very little here about an insanely social species like Homo sapiens.  Humans are like crows: they are never happier, and never noisier and more vocal, than when they are in flocks of thousands.  (See any shopping mall, football game….)  Humans are not territorial in the way birds are; they have absolutely no tendency to defend standardized small areas against all comers (except family).  Like crows, they always occur in large, complex groups.  Territorial defense by groups has essentially nothing in common with bird territoriality; it is a highly flexible behavior engaged in by relatively large social units.


Many songbirds not only sing; they display.  This reaches an incredible pinnacle among the bowerbirds of Australia and New Guinea.  The males build elaborate stages on which to perform their songs and dances.  They are polygynous, and the most spectacular bower and display lures more females than the others, so the male who creates them leaves a disproportionate number of descendents.  This has led to runaway evolution.  The Satin Bowerbird of eastern Australia not only makes an elaborate bower, but he paints it with colorful clays, and decorates it with bright blue objects that set off his iridescent blue plumage.  Satin Bowerbirds are famous for stealing blue clothespins and other bright blue plastic and metal objects.  On top of that, the male sings a brilliant song full of imitations of other birds (see Johnsgard 1994:204-223).  It is really impossible to imagine all this being done without a great deal of intelligence and self-awareness.


Mammals as Model


Turning to mammals, there are few studies of song.  Mammals as diverse as grasshopper mice, whales, and bats  have long, patterned, musical sound sequences that appear to be true songs.  Many species of bats have partially-learned songs that display a “grammar” comparable to that of birds: they structure their phrases and songs quite deliberately.  They communicate species and individual identity, and also place.  They apparently are unusual in that they do not defend territory by song; they live in dense colonies, and use the songs to define colonies and hold them together (Morell 2014).

Howler monkeys (which I have studied in the field) use their howling choruses exactly the way birds do:  To hold territory, communicate with neighbors and tell neighbors from strangers, attract and hold mates, express excitement, and generally express high spirits (personal observation; details on request—I have lived among them and spent countless hours with them).  The African rock hyrax, a small mammal similar to a marmot but related to elephants, has a complex song (Morelle 2012).  Rock hyrax males sing long songs, which they learn, and they have local song dialects.  Apparently the songs are sung to impress females.  These songs have some sort of grammar, comparable to whales, etc.; presumably phrase structure grammar.  Their song consists of “wail, chuck, snort, squeak and tweet” (Morelle 2012).

Better studied are the howling choruses of dogs (Horowitz 2010), wolves (Mech 1988) and coyotes.  These appear to be true songs in that they are sharply marked off from ordinary, largely instinctive noises (barking, whining, growling, etc.).  Howling is more consciously managed, more patterned, more unique to individuals (thus presumably more learned), and more consciously given in special, marked situations.  It serves the same functions as bird song, but it also holds the packs together, communicates the place and situation of members who cannot see each other, and communicates various kinds of emotional arousal.  Sirens and other howl-like noises always set off a chorus.

Some personal observations on howling and canine music reveal that it is more complex than usually understood.  Many humans howl to or with their dogs, and the dogs often respond. The Hollywood movie Never Cry Wolf (based on Farley Mowat’s book of that title, 1963) has a scene in which the actor plays his bassoon to the wolves and they answer.  This was Hollywood romance, but it turns out not to be totally so.  I was once camped in a remote part of British Columbia.  At the next campsite was a professional saxophonist who had obviously seen the movie.  He played his saxophone to the wolf pack howling nearby, and they answered.  He would play a riff (as close to the wolves’ pitch and cadence as he could) and they would answer perfectly, throwing in some improvisation of their own.  They kept it up half the night.  The wolves answered him on key.   Sometimes they seemed to be copying his tune, but this was impossible to determine.  He was most certainly imitating them.  I would bet that he played the riffs in his next concert and said he learned them from the wolves.


Mammals and Society


Closer to home, the three packs (pairs with young) of coyotes that range into Two Trees Canyon, though they are all related, have quite different howling behaviors, which have persisted over at least the last several years.  The southfork pack’s vocalization is typical of local coyote behavior, howling at dawn and when sirens sound.  The summit pack is notably more quiet.  The northfork pack is amazing: not only do they vocalize far more than the others do, almost always starting any group chorus, but they produce not only howls but an incredible range of squeals, moans, barks, yodels, and sounds beyond description or name.  I have heard other coyote packs do this too, especially in remote desert areas, but it seems definitely a behavior limited to certain highly musical and improvisational packs.  In Two Trees Canyon the northfork pack almost always does it if they howl at all.  The other two packs almost never do.

Some dogs go on to produce music in other means.  I have no idea how much conditioning goes into the circus and TV acts in which dogs play horns (by squeezing bulbs with their mouths).  One—only one—of the dogs in my life learned spontaneously to play tunes on his squeaky toys, and spends many minutes at a time doing so.  He will create a three- or four-note phrase and repeat it several times, then switch to another, just like a mockingbird.  He clearly does it for pleasure; he will leave playing with the other dogs to do it, and it has never been rewarded (except for some verbal praise, after days of doing it).  He sometimes plays along with a music record, matching the timing.  The interesting thing here is that he is expressing himself in a totally unnatural and purely learned way.

As to ordinary communication, dogs, wolves, and coyotes do not limit their communication to vocal channels.  A full interaction between two canids, especially if they have not seen each other for a while, is quite striking.  They make a range of appropriate sounds: bark, whine, growl, etc.  Meanwhile, they bow, wag their tails (even coyotes do this when packs reunite), run about, present their throats to each other, flag their ears, crouch, sometimes roll over, and do a whole range of other behaviors (Horowitz 2010).  There is a whole language of ear elevation/depression, tail elevation/depression, fur erection/flattening, mouth corner drawback, etc.  (Drawing back the mouth corners to show deference is a pan-mammalian behavior that is the origin of the human smile.)

This, however, is far from all.  A human with a sensitive nose learns that the dogs are also releasing a strikingly wide range of pheromones from their anal, foot, shoulder, head, and other scent glands.  Each of these glands has a different mix of chemicals, particular to that gland but also varying by individual, allowing individual recognition.  Dogs communicate not only identity but emotional level and situation by scent.  Song—howling—is only one small, very public part of their repertoire.  Fine-tuning intimate social interaction in canids is not done by howling but by soft sounds, postures, scent, and other low-key means, all of which can be varied and adapted.  The basic behaviors are innate, but the fine-tuning and the combination are not.  They are learned, or innovated, and deployed according to sitiuation, with every appearance of being under some (though not necessarily much) conscious control.

Although this has been known for decades, it is surprisingly little studied, and the standard books on canine behavior barely mention it—quite amazingly, since everyone knows, or should know, that dogs live in a world of scent just as much as humans live in a world of sight.  The degree to which pheromone release is under voluntary control remains mysterious.  A striking example of how clueless behavioral scientists are about this is their use of the “spot on the forehead, in the mirror” test to see if dogs have “a concept of self.”  I have made it a point to watch dogs confronting mirrors for the first time.  They invariably look startled at the strange dog, sniff at it, and instantly lose all interest.  They never bother with mirrors again.  Of course dogs recognize self and other, but they do it by scent, as every dog owner knows from watching dogs identify their own toys, blankets, and so forth, and from watching dogs “read the local newspaper” at the neighborhood fire hydrant or lamppost.

A dog communicates mate-getting and territorial messages, but also fear, pain, excitement, pleasure, annoyance, anger (defensive or aggressive), desire to play, desire to take a walk, and much more.  Dogs have specific barks for specific situations; the play-bark in particular is quite different from the aggressive bark. A dog can combine vocalizations, ear flagging, tail wagging or drooping, stance, and facial expression to transmit exceedingly precise and subtle messages about level of arousal, intended behavior, and so on.  Dogs communicate not only a desire to play, but exactly what level of play they want, and how active and violent it is to be.  When one dog gets carried away in play and hurts the other, the hurt animal gives out a characteristic yelp, whereupon play immediately ends and an apology is forthcoming—often by the dogs stretching their necks out and touching noses.  Then play resumes.

Consider an interaction that happens almost every evening in my house:

Pup:  “Hey, let’s play!”             Older dogs:  “Lay off, we’re resting.”

Pup:  “You can always rest.  It’s evening! Active time for dogs!  C’mon!”

Older dogs:  “Well, maybe.  What you wanna play?”

Pup:  “We might play with this squeaky-toy.  Kit, you especially—here it is, just to tempt you—right in your face.”

Older dogs:  “Oh, OK, I guess we can play a bit, if Mom lets us get away with it.”

All this is communicated by gestures and barks, but the messages are clear.  “We might play with this squeaky-toy” actually is: play-bow with frantic tail-wagging; presentation of toy to older dogs, especially Kit, the more playful one; dancing around a bit; more play-bow and wagging and shoving toy in Kit’s face till he relents.  Note that it involves a clear subjunctive mood (the pup is just suggesting it); it involves a purely learned category (squeaky-toys are not part of canine evolution, but the dogs know everything about them); and a rudimentary theory of mind: the pup knows the older dogs have to be coaxed, with Kit being easier to coax.  Furtive glances at my wife express that final “if” clause better than words.

Even a human can generally tell when an angry dog is about to attack directly, or too scared to attack, or not interested in attacking.  Above all, however, dogs and coyotes communicate very complex and subtle messages to pups:  family bonding, support, reassurance, nurturance, alarm, and so on.  I have very often watched coyotes tell their pups exactly how afraid of me they should be, and exactly what to do about it—how far to run, when to hide, and so on.  I have also watched adults teaching young what to eat and how to eat it.  I have watched enthusiastic reunions with young who had been out wandering.


Important here is that all this communicative work, unlike bird song, is eminently social, directed at the pack and at any other canids in the neighborhood.  Coyotes and dogs howl to each other, and at closer range go through a wide range of typical social behaviors if they know each other.  Unlike bird song, these various canine modes are highly productive functionally as well as structurally.  They convey a wide range of social messages and nuances, from playful friendship to savage attack, from warm interest to cold disdain.

On the other hand, though wolves have over ten times the brain mass of songbirds, they are not up to the human level.  There is no indication that canids are saying much that is new and different.  They can fine-tune their very complex and interesting social lives in a way that birds do not approach, but (contrary to what my wife says about our border collie) they do not write philosophy books, speculate on astrophysics, or do calculus.  To be sure, most humans do not do those things either, and my wife may be right that “Bandit is smarter than a lot of people” (as she regularly says), but still we do not expect much new or exciting from the Canidae.  I fear that Gary Larson was right in the Far Side cartoon that showed a scientist who had invented a machine to translate barking into English and found that it meant just “Hey!  Hey!  Hey!”

This leaves us somewhat lost, for, frankly, I have never seen believable accounts of primates being any better at communicating than wolves are.  The one exception would be the few chimps and bonobos that have been taught to do limited signing.  However, dogs can clearly beat this when it comes to following and executing commands expressed in symbols.  Sheepdogs routinely learn 150 to 200 whistled and/or gestured commands, and we are not talking “sit” and “stay” but “pick out the black sheep, cut him out of the flock, and bring him here.”  One dog, Chaser, has learned over 1000 English words (Pilley 2013).  Dogs can easily figure out that if you tell them to “bring the X,” where X is a new word and the set that is indicated has one strange object in it, they are to bring the strange object; and thereafter they will remember it is the X.  Dogs learn to follow pointing fingers to find objects (wolves do not), and dogs learn to point in that way with their muzzles.  They may do that naturally, but pointers learn to do it with a combination of muzzle and foreleg, and that is not natural.

Dogs do plan ahead, and chimps apparently do, and canids most certainly recognize other individuals and act on the basis of what they know about those others.  Every owner of more than one dog knows how one dog will figure out ways to take advantage of personal differences from other dogs to get some food.  I once had two dogs, one a smart leader, one a born follower.  The leader was smaller and less competitive at the food dish.  So she quickly learned to lead the other one into the house, then quickly whirl, leave the other inside, and rush outside to the food.  It worked every time.  All my other dogs have learned similar games, with humans as well as with dogs.  They immediately learn which of their human pets is an easier mark for food, which for walks, and so on.


Mammals Not as Model


We have, however, run into a blind alley here.  Chimps do not sign in nature.  Dogs do not learn 1000 words or 200 commands in nature, nor do wolves or coyotes.  Dogs do not plan ahead more than to the next snack or walk.  Neither, apparently, do chimps.  They can plan a little, and they can “read minds” a little, but they are apparently quite far behind humans (Suddendorf 2013) .  They have, in captivity, learned to call for apples by different notes, and chimps kept in the Netherlands learned new apple calls when moved to a British zoo (Nature 2015), but this is a one-“word” case in a totally unnatural situation.

Thomas Suddendorf, in an excellent recent review of the human-animal divide, sees a “gap” between animal basics—ability to communicate and remember, social reason, physical reasoning, empathy, tradition—and strictly human things, which he lists as “language, mental time travel, mindreading, theories, morality, culture.” (Suddendorf 2013:216).  He hints rather broadly that all the human skills seem to boil down to one thing: an ability to see the world more deeply and widely than animals can do.  This is close to Marc Hauser’s five (or more) levels of recursion.  More to the point, it was anticipated by David Kronenfeld in 1979, in his article that pointed out that language skills are an expectable corollary of the ability of humans to plan ahead in complex ways, as shown by Karl Lashley long ago (Lashley 1960).  The human tendency to think in tactics, strategies, objectives, goals, and the overall mission reprises the five levels of recursion.

No nonhuman animal has been shown to create genuinely new messages in the sense that a folktale or a myth is a new message.  (The isolated “water bird” story in the chimp literature—actually bonobo literature; it was Kanzi the bonobo, studied by Sue Savage-Rumbaugh [Suddendorf 2013:85]—is wildly suspect.  I think the bonobo was signing “water” and “bird” separately.  Kanzi did produce simple commands, like a two-year-old human.)

Canids can communicate endlessly about the latest changes in their social situations, and can indicate there is food and lead pack-mates to it, but this seems about the extent of their productivity.  To my knowledge (and I have a lot of field hours with several species of monkeys, as well as a fair knowledge of the literature), this is about the situation with primates too.  (For a recent review of primate intelligence, see De Waal and Ferrari 2012.  For a particularly fine account of one species, see Cheney and Seyfarth 2007; I know the actual baboons they studied, from personal experience in the Okavango Delta—and can testify that they are fiendishly intelligent; they routinely stole our stuff and shook us down for food.  They also combine vocal and other behavioral signals in context-appropriate and innovative ways, as canids do; I have watched this at some length.)

Chimps can communicate incredibly complex and detailed messages about immediate social states and situations (see e.g. De Waal 1982, 1995), and quite a bit about immediate food prospects, but they cannot—for example—tell a friend what happened last week, or describe in detail how to find a food item that is out of sight.  (They can indicate hidden items by signs, however.)  This sort of displacement in time and space is a purely human ability, and must have had considerable effect on driving language evolution.


Humans as Distinctive


We now turn to another biological matter:  human physical evolution for talking.  This is largely beyond my competence, but a few things need to be stressed.  One is the wide distribution of language and/or speech over the brain: not only Broca’s and Wernicke’s areas, but involvement of the frontal cortex more or less in its entirety, as well as specialization in the motor sectors for fine-tuning motion of mouth, tongue, lips, throat, and hands.

This accompanies major evolution of the vocal apparatus.  The glottal cords do not exist in any other mammal (but songbirds do have perfect analogues).  They not only allow sound production of many types; they also are part of a mechanism to allow us to talk while eating, which many of us do far beyond what our mothers permitted at the table.  In mammals, the breathing tract crosses the swallowing tract, one of the many proofs that if we are the product of “intelligent design” the designer got drunk pretty frequently.  So a specific mechanism had to evolve.  The human mouthparts, including the lips and especially the tongue, are highly evolved to allow extremely precise articulation of a wide variety of sounds.  Apes cannot do this.

The sheer amount of gross physical modification in the mouth and throat, and the concomitant neurological remodeling of the brain, did not happen in a week.  This is not at all comparable to the cases of rapid evolution in which one gene flips, conveys a dramatic Darwinian advantage, and becomes fixed in the population in relatively few generations (as in the case of lactose tolerance in Europe and East Africa).

Of course, this refers to the faculty of language in the broad sense (FLB; Fitch et al. 2005).  Language in the narrow sense (FLN) is the mental representation side of language: language as an “instrument of thought” (Fitch et al. 2005; Chomsky 2013).  Language in that strict sense is the ability to arrange concepts in complex structures that can be subjected to transformations; it is not necessarily spoken, since it can be expressed perfectly well by gestures or various forms of writing, or not expressed at all—we talk inwardly to ourselves all the time (at least I do).  And there is, of course, a strong case to be made for language having evolved in gesture mode and only later become spoken (as Kronenfeld suggests, and he has a lot of company; Suddendorf 2013:81.)  I doubt this, but the point is that language, as it appears now in Homo sapiens, is totally unlike bird song or wolf howling in that it is decoupled from the vocal and is an internal capacity.

We simply cannot escape the conclusion that language (FLB) evolved slowly, over a long time. And even FLN was not—cannot have been—an “invention,” or the result of a rapid mutational process.  It simply involves too many genes, structures, and systems, variously coopted to linguistic use (for some of the complexities, see Christiansen and Kirby 2003).  Brain scans show that a huge percentage of the brain is involved, to say nothing of whatever means—tongue and mouth, hands, or writing arm—is used to get it out in public.  The extremely oversold, and now largely disproved, idea of evolution by rapid jumps could work—if it ever worked—only if one or a very few genes controlled the trait in question.  This is not the case with language (even FLN).

It was not a matter of evolving a “mind,” whatever that is, and then suddenly putting it to use to talk.  It was not a matter of switching from gesture to vocal, unless that switch was done very early (which is, of course, possible).  (“Mind” is even more poorly defined than “consciousness,” which at least has a recognized operational meaning: “awake and aware as opposed to being knocked out.”  “Mind” has no operationally meaningful definition at all, so far as I can determine from the literature.  The currently popular line “mind is what the brain does” is a capitulation, not a definition.  What the brain does is use glucose to fuel transmission of neural electric impulses via a whole series of complex neurotransmitter chemicals, facilitating or inhibiting specific synaptic connections in the process.  If that is mind, so be it.)

Given the well-known link of brain size, social group size (Bourne 2011), and communication, and in birds of brain song center size and song complexity, it is impossible for me to escape the conclusion that language evolution tracked closely the evolution of brain size from around 350-400 cc (chimps and Australopithecines) to the modern 1400 cc.   Sayers et al (2012) caution anthropologists against the “chimpanzee referential doctrine,” the idea that the chimp is a perfect copy of the last common ancestor of chimps and humans.  They point out that the two lineages have been diverging for millions of years—they argue for eight million—and there are many striking differences that have clearly evolved in the chimp line.  One is huge canines; early hominids and all hominins have small canines.  This probably indicates that the violent male-male conflicts over territory and harems are a recently evolved trait in apes, not an ancestral condition.  All this makes me wonder if chimps have dumbed down.  Their brains are large, but their social and communicative behavior, based on what I have read, seems less impressive than that of wolves.  Possibly Australopithecines had a more complex communication system, involving more conscious shaping, deployment, and combination of instinctive sounds.

I think Homo erectus must have had an intermediate ability, developing from very limited productive communication ca. 1.5-1.8 mya up to something very simple but possibly definable as “language” by 350,000 ya.  I think language and linguistic capacity then continued to evolve, up to the Neanderthals and very possibly even more in modern Homo sapiens sapiens.  The relative simplicity of Neanderthal material culture and its lack of anything clearly artistic or ornamental leads one to suspect that they had a simple, practical sort of language, possibly somewhat lacking in the poetic and speculative flights we associate with the medium.


Humans Not as Distinctive


Why did language evolve?  If we consider the development of complex communication systems among animals, we find that all of them without exception developed to talk about complex social situations.  Birds have to negotiate territory, neighborhood (familiar fellow members of the species), mating, and family life.  Wolves have to deal with packs and neighboring packs, and wolf packs are socially very complicated.  Coordinating pack hunting is especially complex.

Primates are almost all quite social.  Baboon troops sometimes number in the hundreds; such large troops have an internal structure, with family and kindred groups.  (One might think that there is much more going on in these troops than the descriptions attest, but my observations—which are limited—do not go beyond published documentation.  Cheney and Seyfarth—who studied the same baboons I observed in Botswana—have the full story pretty well down [2007].)

In all cases, the more complex the social life, the more complex the communication, other things being equal.  Recall that brain size varies this way too; more social animals have larger brains than their socially simple relatives—among the primates, among canids, and in many other groups.  I would assume, in fact I cannot believe otherwise, that language, brain size, and social group size all evolved together.

The expansion in human brain size (from around 400 cc in Australopithecus to 1400 cc today) took place largely in the last 2,000,000 years, mostly in the Homo erectus stage.  On the whole, throughout the animal kingdom, brain size tracks body size and sociability—including the complexity of social messages.  Since humans have had about the same body size since Homo erectus came in, this brain expansion is clearly related largely to social factors.  (Neanderthals had brains the size of ours or larger, but also had larger bodies, so their bigger brains probably have to do with bigger bodies rather than with more sophisticated communication.  There is still rather little evidence that they had sophisticated symbolic communication.)

Robin Dunbar and many, many others have made the obvious assumption that human language must have evolved primarily as a social tool.  Dunbar’s findings that most human talk is “gossip” certainly fits this.  Dunbar’s number of 50-150 is a very robust finding across all human societies for the number of people in the ordinary face-to-face group.  (Several people, including some of my students working on this issue, have independently corroborated it.  See e.g. Binford 2001.)   I think the expansion of the brain from 400 to 1400 cubic centimetres tracks perfectly the expansion of the group from 20 to 500.

Language as it exists today is vastly overengineered for the needs of a nuclear family; it is hard to imagine it being useful enough to evolve unless it was required to fine-tune society and sociability in groups of a hundred or more.  Dunbar is surely correct in seeing modern humans as having evolved in groups of 50-150 (Dunbar 1993, 2004).  Dunbar found that about 2/3 of human talk is about immediate social relationships, this figure being broadly consistent from Cambridge dons to working-class and rural people (Dunbar 1993, 2004, 2010; Fitch uses Dunbar’s findings).

A further number of 500 is equally robust across societies as the total number of fairly regular contacts a person has.  The corroborree group in Australia, the tribe among Native Americans, the village in medieval Europe, and the widest friend circle in modern America all approximate this.  (A quick survey of my Facebook friends shows a modal number around 300, with variation from around 10 to several thousand—but those last are a very few cases of people using Facebook as a professional tool.  The vast majority of my friends report 100-700 friends.)

Increasingly complex communication would have to emerge for such groups.  This must have entailed more and more communication over wide distances and at night.  That soon made gestures and instinctive cries inadequate.  I think Dunbar has the strongest case.  I suspect that, indeed, the first words included “father” and other kinterms, as in the development of infants’ speech: ontogeny recapitulating phylogeny again.

Such compulsive sociability influences language; Fitch emphasizes the importance in modern humans of mitteilungsbedürfnis, the human need to talk all the time about feelings and thoughts.  (Some, especially of the female gender, seem to have more mitteilungsbedürfnis than others.)

Humans, over the millennia, became more and more specialized on finding rich patches of food, as they became more intelligent and social.  The bigger the brain and the bigger the social group, the more the group depends on rich patches of food.  They need these because there were more and more people to feed and lots of brain to fuel; the human nervous system uses 400 calories/day.  This would have been another positive feedback loop.  Language must have evolved partly for the purpose of telling all one’s kin where the rich food patches were.

Naming plants, animals, and landscape features, and describing how to find them, must have been very early.  The idea that language arose to talk about “hunting” is flagrant male chauvinism at its silliest.  Gathering must have had just as much to do with it.  One wishes to report to the group where the best berry-picking is, where the seeds are ripe, where the greens are springing up, and where the flowers indicate a good future fruit crop, just as much as one needs to talk about where the game is.

Tools are another possibility; my mentor in such matters, Sherwood Washburn, always used to say that “language evolved to talk about tools” (Washburn, in countless lectures).  Philip Wilke and Leslie Quintero believe that even Homo erectus tools are too complicated to make without at least some verbal instruction.  They are world-class flintknappers and instructors of younger flintknappers, so they know whereof they speak.  The need for more and more rich, uncommon, and hard-to-find patches of stone also gave people a reason to talk about where to find it.  Fitch somewhat minimizes the tool theory, noting that people usually learn to make things by observing and imitating rather than by listening.  I certainly agree—that is my experience—but some talking is often necessary, in stone tool making as in other processes.

My experience is that traditional people around the world talk largely for social reasons, just as we do, and that they learn about food-getting and toolmaking with a minimum of verbal instruction.  But the verbal instruction, little though it may be, can be critically important.

The larger the group, the more internal differentiation would inevitably have arisen.  Gender roles, age roles, and differential expertise and skills would have appeared immediately.  Even chimp troops have rudimentary forms.  These differences would have increased over time.  Differences in status and power and in social function would eventually have arisen and increased.  The oldest profession—which is that of healer, not prostitute—must go back to the very dawn of human society as we know it.  Perhaps before the dawn.  Tools and feeding arrangements were getting more complex too, and these would have had their effect.  Above all, humans would have been scattering out to forage, and would have needed to tell each other where the best food was.  I think this was probably not only a direct cause of language, but also a direct cause of social complexity and therefore an indirect cause of language.

Social communication extends to talking about social place, status, role, mating, childrearing, balancing obligations, reminding people of debts and favors, inviting people to share food and tools, instructing the young on life skills, talking about hunting and gathering, coordinating any and all activities, discussing territory and place and range, and much, much more.  No wonder a complicated communication system was needed.  But without a large group it would not have been necessary, even for an intelligent omnivore.  Raccoons manage fine without it.

An interesting point is the complexity of grammar and syntax in all languages.  Academics often miss the import of this, since they are used to academic prose, where the vocabulary is learned, arcane, and diverse, but the syntax is usually limited to declarative sentences.  Ordinary everyday speech, on the other hand, can be a wild tangle of conditionals, subjunctives, dependent clauses, dangling participial phrases, and everything else.  You may not need to know the four different types of infinitives, all inflected, that Finnish displays, if you merely read Finnish newspapers, but you will have to know them to talk.  I have noticed this extreme contrast of academic prose (simple grammar, rich vocabulary) and everyday speech (vice versa) in English, Spanish, Turkish, and other languages.  (It is less pronounced in Chinese, which has a very simple grammar, and in German and Maya, whose complex grammars are hard to escape even in simple declarative prose.)  All this suggests that the complex syntax that distinguishes human from animal communication is probably very old and very deeply rooted, contra a theory sometimes espoused by Chomsky and others that it must be a recent and rapid development.

Given the above, the idea that language was “invented” (for whatever reason—one recalls the old “bow-wow” theories) makes about as much sense as saying that people first developed complicated tools and then invented hands to work with them!  I would agree with Chomsky in thinking that his highly philosophical, mentalistic, even spiritual language (his FLN) was a late development (though I would guess on the basis of zero evidence that it began to appear in the Homo erectus stage).  But I assume it came long after productive, expressive speech had already developed the capacity to talk, or gesture, about a wide range of things, from kinship to cutting up dead elephants with stone tools.


Biological Notes and Queries


I find it plausible to think that talking, gestures, and music evolved as one single symbolic-communication system.  I do not find plausible the idea that gesture was first.  For one thing, all our evolution has been in the vocal-auditory system.  Also, all higher animals communicate largely by vocal signs (or by smells, but not in higher primates).  None uses gestures as a principal channel.  Almost all do use gestures and bodily poses and signs as major ancillary markers, however, so I expect gesture was involved from the start in human linguistic evolution.  In modern humans, gesture communicates a great deal, and in deaf language it bears the whole load.  People gesture when talking on the telephone.  In fact, I routinely see people endanger their lives by driving alone while holding a cellphone in one hand and gesturing wildly with the other.  People blind from birth gesture to each other when talking.  This suggests that gesture has indeed maintained a linguistic function straight through from the chimp, and has always been used to convey specific information.

As to music, I am always intrigued by the idea, first floated by Giambattista Vico (2000 [1725]) and later by Darwin (1871) and by Steven Mithen (2007), that language and music forked off from a single original form, a sort of warbling or chanting.  Communicating mood states vocally is also standard for humans.  Cries of pain, lulling noises to babies, noises of surprise, shouts of anger, and the like are very close to chimp calls. We now culturally construct vocal communication of mood into music.  However, I have reluctantly abandoned this position in light of the fact that the brain wirings for language and for music are very different, and also in light of the fact that the mix would be more, rather than less, complicated than either descendant.  I still like the idea, and I still am convinced that language and music—the two very complex, highly recursive, highly symbolic modes of communicatioin—evolved together.  At present, and probably always, language deals with specific concrete concepts, music with mood and broad emotionality.  This is somewhat similar to the positions that cries and songs, respectively, occupy among songbirds.


A suite of features has developed in humans for the purpose of talking: improved vocal cords, extremely fine musculature on tongue and lips, and appropriate innervation.  Fitch shows these are not particularly confined to humans, but they are certainly more fine-tuned in the human species.  The human brain is heavily involved in language.  Everyone knows about Broca’s and Wernicke’s areas, but language turns out to be widely distributed over the brain.  Often, the localities that show up in brain scans are ones that are used for other things and seem to have been secondarily used for language as well as their original functions.  This area is controversial.  Two things emerge:  first, gesture does seem to draw on many of these areas, making the gestural theory of linguistic origin plausible, and, second, music (or music-related functionality) is extremely widely and deeply distributed in the brain, making Mithen’s (or Vico’s) theory plausible.

It is worth remembering that a huge brain not only has a huge calorie cost, it has a huge cost in difficulty of birth, leading to more maternal and infant deaths and to a need for midwives.  (A recent claim that human birth is not all that difficult is simply wrong.)  The human brain is about ¼ adult size at birth; a chimp’s is about ½.  So humans are born very immature and in need of total care for months.  This has the interesting side function of making language more developable—the brain is so immature and plastic that language can shape it massively from the start.  This is part of Fitch’s case for the importance of parentese in the development of language (see below).


And What’s Right?


I tend to follow Chomsky (1957; see also 2013) in defining “language” by the presence of true sentences (or similar strings):  Long sequences of morphemes that can be rearranged to make questions, negative clauses, etc., and can have dependent clauses embedded.  No animal does anything like this.  Even the mockingbirds, which are known to sing around 12,000 different phrases and to learn other birds’ songs and work those into their own, apparently never do anythnig beyond a simple phrase.  They string phrases together endlessly, but they appear to have no higher-level grammar at all.  The only communications in the animal kingdom that seems almost Chomskian-grammatical are the aforementioned bowers of the most evolved bowerbirds.  The Satin Bowerbird combines materials according to rules, and even seeks out plant fibres to make little paintbrushes, then seeks out colored wet clay, and paints his bower.  The bower and his display in it are all structured according to fairly sophisticated rules, and the process of successfully combining them with his song takes us almost into the realm of transformational grammar.  But all this is done simply to lure females.  It is all different from human society that it has nothing much to tell us.

So, why and when did we evolve full syntax in the Chomskian sense?  There is no way of knowing.  It must have been gradual.  I suppose Homo erectus had something, perhaps three-word sentencelets.  I suspect that full-scale language with dependent clauses embedded in questions embedded in long utterances probably came only with Homo sapiens.  But, of course, we will probably never know.  If “ontogeny recapitulates phylogeny,” and child language development shows us anything, humanity started with single words, then got to two-word phrases, then to three, then to very simple declarative sentences, then to imperatives, questions, and negations, and finally to embedded clauses and the like.  But, as Stephen Jay Gould pointed out in his book Ontogeny and Phylogeny, ontogeny does not always recapitulate phylogeny.  (Better go with James Joyce:  “Hagiography recapitulates proctology.”)  Fitch (chapters 10-14) thus maintains that human evolution may not be well reflected in child language.

We now know that Homo sapiens in the narrow sense appeared only about 150,000 years ago, in east Africa.  H. sapiens radiated out from Africa into the rest of the world a mere 70,000-100,000 years ago.  This makes the possibility of reconstructing a bit of “proto-world,” or at least “proto-diaspora,” actually thinkable.  I expect this language was inflected, at least with noun/pronoun cases such as nominative, accusative and dative, and verb conjugations to show past, present, and future, and to indicated continuing vs. one-shot actions (i.e. “imperfect” vs “perfect” forms).  This seems to be something close to a common denominator worldwide.  Exceedingly complex grammars (like the “polysynthetic” grammar of Inuit) are rather rare.  Isolating languages (i.e. without any grammatical endings) do not exist.  The claim that Chinese is such a language is based on classical Chinese, which was originally a sort of court speedwriting, probably introduced by scribes to write down court actions as they occurred.  We know that ancient Chinese, like modern Chinese, had inflections and functional grammar words, because there are some verbatim quotes preserved in historical texts.

Songbird evolution again proves useful here.  Leaving aside a few nonpasserines that have independently evolved song, we can consider the passerine or songbird order, Passeriformes.  It includes several families that are vocally primitive: they have songs, but the songs are purely instinctive, identical throughout the species’ range.  This is true of flycatchers, for instance.  Then there are some more evolved groups that have very simple but still partially learned songs, such as the chickadees, nuthatches, and creepers.  Then there are the many that have songs with a strong component of learning, much individual variation and innovation, and local dialects, such as many of the sparrows.  Finally, the mockingbirds and some other groups have incredibly complicated songs, involving learning from other species as well as their own.  Except for the fact (noted above) that social songbirds often do not sing, this proveds a model for how human language could have evolved.

Fitch closes his book with considerations of gestures and of music as linguistic bases.  I am duly convinced that people have been gesturing straight through the entire course of evolution from the apes.  Gesture was an integral part of language evolution, and that gesture is a reasonable place to look for evolutionary innovations.  The basic reason for considering gesture so basic is that apes gesture all the time, and can sign quite complicated messages even when left to themselves, to say nothing of the degree to which they can learn sign language.  (Fitch emphasizes this, but even he understates it; more and more data keep coming out on ape skills in this regard.)  I am also convinced by long discussions with David Kronenfeld  But what really convinced me was observing not just one but many drivers, driving alone in their vehicles, holding cellphone to ear with one hand and gesturing wildly with the other.  (S. Goldin-Meadow 2010 has studied cellphone gesturing, and also notes the fact that congenitally blind people gesture.)  Surely it must be a strong instinct that can thus overcome both minimal regard for safety and minimal obedience to California’s strict driving laws.

As to music: here Fitch finally ran out of energy, not surprising in a book well over 600 pages long.  He lists some design features that supposedly separate music from language, and gets some sadly wrong.  Among these are discrete pitches and isochrony (p. 479), which, if he means these words the way they are normally used, are universal in tonal languages, and thus anything but distinctive of music.  Fitch discusses tone languages and notes that most of the world’s languages are tonal, so either he is using the terms in a strange and incomprehensible way, or he has made an oversight.  Another design feature he considers distinctive of music is repeatability.  For Fitch, a song can be repeated over and over again, but language is about “pervasive novelty” (480) and is not normallly repeated.  The only exception he allows is the case of minor greeting rituals.  He is thinking too much of academic discourse.  In the wider world, poems, stories, slogans, political rhetoric, taglines, and so on are repeated endlessly, and no one seems bored by that (except possibly some academics).

More serious a deficiency is Fitch’s failure to tell us whether music was fully evolved before speech came along, or whether they evolved together, or whether, as suggested by Giambattista Vico almost 300 years ago, they differentiated from a more primitive warbling or chanting, imperfectly worded and imperfectly musicalized (Vico 2000 [1725]; cf. Mithen 2005; Vico is oddly uncited by Fitch).  Vico thought modern epics were survivals of this stage—or more accurately of the stage just after it, when language existed but was still sung rather than spoken.  His general model fits well with the music of small-scale societies, which is usually extremely simple.  It might represent this next-level-up, where chanting with meaningless syllables is still common, but real tunes and words have also entered the soundscape.  Fitch writes as if music were as complex from the beginning as modern folk and even concert music is.  But the music of the San, Inuit, and many comparable groups is almost as simple as Vico’s model suggests.  Finally, Fitch does not say much about the differences in brain wiring for music as opposed to language, though these are substantial.




I conclude that language probably emerged from a mixed-channel system like that of many other animals.  Birds have displays and songs.  Dogs have scent, voice and motion.  Chimpanzees have body positions, gestures, and vocalizations.  Ancestral humans had gestures, protolanguage, and protomusic.

I think protolanguage evolved from the single instinctive cries of apes to longer but still instinctive utterances; then to phrasal language with limited phrase-structure grammar; then to more complex, sophisticated, and innovative protolanguage, and finally to full-scale language as we know it.  I believe David Kronenfeld is right in arguing that this developed along with wider abilities to plan, think ahead, and in general use recursion to bootstrap thought.  But children develop language so much faster than they develop recursion in other fields that there is clearly something more going on than mere carryover from task planning to speech planning (Fitch, pp. 492-494; there is a large literature on this issue; see also Chomsky 2013).

I agree totally with Fitch that language must have evolved in groups with relatively high relatedness, and that from the first it had a great deal to do with child care and upbringing.  Parent-child communication was critically important to every aspect of language evolution.  I need only cite him for discussion (see esp. pp. 492-494).  As he points out, this is necessary to explain why children learn language so fast, and why language is not confined to adult males, or adult males and females, as song is in birds.

Music took a separate but parallel course.  It evolved from simple singing, no doubt including lullabyes, courting songs, work songs, dance songs, and possibly other types.  Fitch’s insightful comments about the value of language for dealing with very young children certainly hold for music too, and I suspect lullabyes were the very first musical performances.  (Darwin and others held that music evolved for male display, but this does not fit the facts; Fitch’s comments on language and children work for music too.)  It gradually became more melodically and harmonically complex, with musical instruments being added very slowly over time.   It now differs from language in being a holistic, wordless way of communicating emotions and moods, as opposed to an open-ended system characterized by separate phonemes and morphemes and by the ability to combine and recombine these into highly precise but also innovative propositions.

A point too softly made by Fitch is that language, in its full-blown form, is vastly overengineered for mere family life (Bickerton 2014).  Most nuclear families get by without a rich vocabulary or a full range of linguistic performances.  The full play of language requires a wide arena, at the very least a tribal group with politics, religion, ritual, song and story, arguments, debates, discussion of different people, organizing against other groups, and all the other things for which we use language in public spaces.

So, here is a scenario.

Australopithecus:  “Instinctive” cries, used in unproductive and noninnovative situations, but complex, and used by conscious decision to fine-tune social situations.  However, no innovation, no complex grammar, no learning, no productivity.

Homo habilis:  Cries have mutated into a system in which there is some putting together of cries into two-cry or three-cry phrases, which are somewhat learned.  Increasing importance of learning over time. Homo habilis is a not-at-all-missing link; a stage fossil showing some mental and manual development.

Homo erectus:  Brain size increases; group size increases with it.  I think selection for larger and larger groups drove the whole process.  Language evolves from simple phrases made by combining semi-instinctive noises (“ug, wug, zug, phthslug” meaning “I want dinner”) in very early millennia to a simple but functional language-like system late in the career of this temporal species (“I want dinner and there is a dead mastodon.”)

Homo sapiens sens. lat.  I agree with Chomsky (1957) that the real watershed between nonhuman and human communication is the sentence—a long, complex utterance that can be transformed by head-down planning processes.  I assume that the phrase-structure grammars of Homo erectus developed into the capacity to produce actual sentences, that can be changed to questions, negatives, passives, etc. by grammatical rearrangement, over the last 300,000-400,000 years, possibly only the last 200,000.  Derek Bickerton (2007, etc.) could well be right that pidgin and creole languages re-create something like the original human sentence and grammar structure.  Brain size levels off with Neanderthal, but brain function evidently keeps growing.  We just don’t have any art or symbolic-type items of any kind from the pre-sapiens world.

I find it difficult to imagine the steps, but assume people slowly evolved the ability to plan more and more complex recursions, or at least embedded constructions, and transformations.  (Luuk and Luuk have recently pointed out that embeddedness does not really require recursion in the formal sense; iteration can model it.)  At present, the simple phrases of bird and dog communication require about two levels of embedding: planning the phrase and vocalizing it.  Humans can plan whole narratives, within which are nested paragraphs, within which are sentences, within which are phrases, within which are morphemes, within which are phonemes—six levels of embeddedness.  This is pushing the “magical number seven” (Miller 1957) awfully hard, and I doubt if we can go much higher.  Music is recursive, just as language is:  notes to phrases to tunes to themes to songs to symphonies….  Both can produce infinite numbers of sentences/compositions.  (Luuk and Luuk point out that the numbers could not really be infinite, given the finite number of humans and finite time for them to talk and write, but what matters is the potential infinity.  No animal has that capacity, except in a trivial way; no two animal calls are quite identical, but the messages are the same dull stuff.)

An interesting sidelight is that we academics tend to think our typical language is complicated while “low-class” people have what a monumental snob of a British linguist once called “restricted codes.”  The truth is rather the opposite.  Academic talk is largely simple declarative sentences, however long the words:  “The orbitofrontal cortex of Homo sapiens is greatly expanded in comparison with that of the Australopithecines.”   “Low-class” speech tends to be simpler in vocabulary but complex in grammar, idiom, style, and metaphor:  “And I was like wow when I realized that Susie would have gone with Mike if she had thought to do that, but since she was, you know, like, in the middle of something she thought was more important…”  I have noted the same contrast of simple grammar and complex vocabulary with complex grammar and simple vocabulary in Spanish and in Maya.  (Not, however, in Chinese—all simple—or German—all complex.)  Similarly, traditional languages of small-scale societies almost all have fantastically complex grammars compared to English or Chinese.




Thanks to Alan Beals, David Ellerman, Alan Fix, and David Kronenfeld for discussion of these points.





Bickerton, Derek.  2007.  “Language Evolution: A Brief Guide for Linguists.”  Lingua 117:510-526.


Bickerton, Derek.  2014.  More Than Nature Needs: Language, Mind, and Evolution.  Cambridge, MA: Harvard University Press.


Binford, Lewis.  2001.  Constructing Frames of Reference:  An Analytical Method for Archaeological Theory Building Using Ethnographic and Environmental Data Sets.  Berkeley:  University of California Press.


Botero, Carlos A.; Neeltje J. Boogert; Sandra L. Vehrencamp; Irvy J. Lovette.  2009.  “Climatic Patterns Predict the Elaboration of Song Displays in Mockingbirds.”  Current Biology 19:10


Bourke, Andrew F. G.  2011.  Principles of Social Evolution.  Oxford:  Oxford University Press.


Cheney, Dorothy L., and Robert M. Seyfarth.  2007.  Baboon Metaphysics:  The Evolution of a Social Mind.  Chicago:  University of Chicago Press.


Chomsky, Noam.  1957.  Syntactic Structures.  Hague:  Mouton.  Janua Linguarum 4.


—  2013.  “What Kind of Creatures Are We?  Lecture I:  What Is Language?  Journal of  II: What Can We Understand?”  Philosophy CX:12:645-662, 663-684


Christiansen, Morten, and Simon Kirby.  2003.  “Language Evolution:  Consensus and Controversies.”  Trends in Cognitive Sciences 7:300-307.


Collins, Sarah, et al.  2009.  “Migratory Strategies and Divergence in Sexual Selection on Bird Song.”  Proceedings of the Royal Society 276:585-590.


Darwin, Charles. 1871.  The Descent of Man and Selection in Relation to Sex.


Descartes, René.  199.  Discourse on Method and Related Writings.  New York: Penguin.


De Waal, Frans.  1982.  Chimpanzee Politics. Jonathan Cape.


—  2005.  Our Inner Ape.  Riverhead.


De Waal, Frans, and Pier Francesco Ferrari (eds.).  2012.  The Primate Mind. Cambridge, MA:  Harvard University Press.


Dunbar, Robin I. M.  1993.  “Coevolution of Neocortical Size, Group Size and Language in Humans.”  Behavioral and Brain Sciences 16:681-735.


—  2004.  Grooming, Gossip, and the Evolution of Language. New York: Gardners Books.


—  2010.  How Many Friends Does One Person Need?  Dunbar’s Number and Other Evolutionary Quirks.  Cambridge, MA:  Harvard University Press.


Fitch, W. Tecumseh.  2010. The Evolution of Language. Cambridge:  Cambridge University Press.


Fitch, W. Tecumseh; Marc D. Hauser; Noam Chomsky.  2005. “The Evolution of the Language Faculty:  Clarifications and Implications.”  Cognition 97:179-210.


Flower, Tom P.; Matthew Gribble; Amanda R. Ridley.  2014.  “Deception by Flexible Alarm Mimicry in an African Bird.”  Science 344:513-516.


Fortune, Eric S.; Carlos Rodriguez; David Li; Gregory F. Ball; Mlissa J. Coleman.  2011.  “Neural Mechanisms for the Coordination of Duet Singing in Wrens.”  Science 334:666-670.


Goldin-Meadow, Susan.  2010.  “Hands in the Air:  Gestures Reveal Subconscious Knowledge and Cement New Ideas.” Scientific American Mind, Sept.-Oct., 48-55.

Haesler, Sebastian.  2007.  “Programmed for Speech.”  Scientific American Mind, June/July, 67-71.


Heinrich, Bernd, and Thomas Bugnar.  2007.  “Just How Smart Are Ravens?”  Scientific American, April, 64-71.


Horowitz, Alexandra.  2010.  Inside of a Dog.  New York: Scribner.

Johnsgard, Paul.  1994.  Arena Birds: Sexual Selection and Behavior.  Washington, DC: Smithsonian Institution Press.


Keller, Georg B., and Richard H. R. Hahnloser.  2009.  “Neural Processing of Auditory Feedback During Vocal Practice in a Songbird.”  Nature 457:187-190.

Kronenfeld, David B.  l979.  “Innate Language?”  Language Science l:2:209-239.


Kroodsma, Donald.  2005.  The Singing Life of Birds:  The Art and Science of Listening to Birdsong.  New York:  Houghton Mifflin.


Lashley, Karl S.  1960.  The Neuropsychology of Lashley.  New York: McGraw-Hill.


Luef, Eva M., and Simone Pika.  2015.  Comment on Scott-Phillips, Thomas C.  2015.  “Nonhuman Primate Communication, Pragmatics, and the Origins of Language.”  Current Anthropology 56:69-70.


Luuk, Erkki, and Hendrik Luuk.  2014?  “Natural Language: No Infinity and Probably No Recursion.”


Marler, Peter, and Hans Slabbekoorn (eds.).  2004.  Nature’s Music:  The Science of Birdsong. Amsterdam:  Elsevier.


Marzluff, John, and Tony Angel.  2005.  In the Company of Crows and Ravens. New Haven: Yale University Press.


— — 2012.  Gifts of the Crow: How Perception, Emotion and Thought Allow Amart Birds to Behave Like Humans.  Free Press.


Marzluff, John M., and Russell P. Balda.  1992.  The Pinyon Jay:  Behavioral Ecology of a Colonial and Cooperative Corvid. London:  T. and A. D. Poyser.


Mech, L. David.  1988.  The Arctic Wolf: Living with the Pack.  Stillwater, MN: Voyageur Press.


Mithen, Steven.  2007.  The Singing Neanderthals:  The Origins of Music, Language, Mind, and Body. Cambridge, MA: Harvard University Press.


Moorman, Sanne; Sharon Gobes; Maaike Kuijpers; Amber Kerkhofs;  Mattijs A.Sandbergen; Johan J. Bolhuis.  2012.  “Human-Like Brain Hemispheric Dominance in Bird Song Learning.”  Proceedings of the National Academy of Sciences 109:12782-12787.


Morell, Virginia.  2014.  “When the Bat Sings.”  Science 344:1334-1337.


Morelle, Rebecca.  2012.  “The Rock Hyrax Surprises with Syntax Skills.”  BBC News Online, Apr. 18.


Mowat, Farley.  1963.  Never Cry Wolf.  Boston: Little, Brown.


Nature.  2015.  “Chimps Learn New Calls for Food.”  Nature 518:141.


Nice, Margaret Morse.  1937, 1943.  Studies in the Life History of the Song Sparrow.  New York:Linnaean Society of New York.  (Dover reprint 1964.)


Payne, Robert B., and Michael D. Sorenson.  2006.  “Song Lines.”  Natural History, Sept, p. 41.


Pepperberg, Irene Maxine. 1999.  The Alex Studies:  Cognitive and Communicative Abilities of Grey Parrots. Cambridge, MA: Harvard University Press.


Pilley, John W., with Hilar Hinzmann.  2013.  Chaser: Unlocking the Genuius of the Dog Who Knows a Thousand Words.  Boston: Houghton Mifflin Harcourt.


Sayers, Ken; Mary Ann Righanti; C. Owen Lovejoy.  2012.  “Human Evolution and the Chimpanzee Referential Doctrine.”  Annual Review of Anthropology 41:119-138.


Scott-Phillips, Thomas C.  2015.  “Nonhuman Primate Communication, Pragmatics, and the Origins of Language.”  Current Anthropology 56:56-80.


Suddendorf, Thomas.  2013.  The Gap: The Science of What Separates Us from Other Animals.  New York: Basic Books.


Taylor, Alex; Rachael Miller; Russell Gray.  2012.  “New Caledonian Crows Reason about Hidden Causal Agents.”  Proceedings of the National Academy of Sciences 10.1073/pnas.1208724109.


Vico, Giambattista.  2000.  New Science.  Tr. David Marsh.  Italian original, 1725. New York:  Penguin.



Leave a Reply