1

1.1 Why music?
Recall a time when you heard music that arrested you. It may have been a fragment or something more. Did it make you long to listen again?1 Did its persistent hold disarm your mind’s ear?2 Did you divulge this new fascination? Perhaps you found a recording and shared it with a friend, or sang a few bars, or simply told them about it. Maybe you went further and committed this experience to a diary.

Still, you may not feel free from it even now. When you recollect the music, does it captivate you anew? Do you understand why? So binding can it be, that the experience of music3 demands reflection in order to yield its meaning. Even then, thought might only offer you scattered notes on its value.

I recall an occasion when I was asked to write program notes for a concert. Before starting, I listened to recordings of the pieces, all of them familiar. Generally, I know what music will flush me with tears, and can predict what will detonate a laugh, and yet I was not prepared for this. As the last piece began to play, my listening was shattered by torrents of joy. Awash in tears, I was as helpless to understand the flood as to halt it. I confess that even now, seated in the library writing this, I feel those sobs threatening to roil up and unmask me to the other inmates studying quietly here.

Once my composure resurfaced, I turned to write about the music. As I did so, its meaning began to rise from the words. Was this enabled simply through reminiscence? Surely not (no memory is so periscopic); rather, the act of writing had dug a channel from music to mind that granted access to the inner life of the music itself. That music was once again made present to me, and as I struggled to articulate my experience in words, my understanding was transformed and, somehow, so was the music4.

1.2 Music’s Nature
How does writing accomplish this exceptional transformation? After all, successive performances present their differing musical interpretations, and discussions shape opinions, both of which are transformative in their own right5. Before examining the relationship between music and writing, we must briefly consider the nature of music.

‘What music is remains open to question at all times and in all places’ (Bohlman 1999:17). Bohlman’s aphorism neatly acknowledges the historical and social complexity of music, without discussing its obvious abstract or physical characteristics. Though we experience it as an acoustic event in daily life, music also exists beyond any manifestation of itself in time; no performance or notation can be said to represent the totality of the phenomenon, what Nattiez terms the total social fact (Nattiez 1990:42), nor can a single hearing reveal the entire substance of a piece.

Because time is at once a necessary condition for the manifestation of music and extrinsic to its existence6, ontologies of music often begin from the nature of the musical experience (manifest in time) and that of the musical work (existent through time) (Kania 2017; Scruton 2016)7. Musical experiences are auditory events; musical works are discrete, notated pieces. Setting aside the ontological intricacies, here it will suffice to speak of musical experiences as performances and works as compositions. To illustrate, consider Johann Sebastian Bach’s Brandenburg Concerto No. 1. A performance of this work is identifiable by place, time, and participants. It is further distinguished by performance practices, mistakes, listeners’ reactions, and various other factors. Such a musical experience is a unique occurrence, not equivalent to the work. As a composition, the work — including its place of origin, composer, form of notation, history of performance and reception — is a unity larger than any one time or place (Goehr 1992). Thus, experiences of the work manifest at a given time, while the work itself exists through time (Young 2011; Rohrbaugh 2003). Because of their intra-temporal nature, musical works can be objects of investigation beyond their manifestations.

However, not all musical experiences are actualizations of musical works. Indeed, some cultures neither employ notation nor have a notion of the work. Still, musical experiences not definable as works may resemble each other to the degree that they can be classified together, possibly by form or function, and may thus serve as objects for consideration beyond their manifestations. This would be especially evident when a given experience employs a great deal of repetition. For instance, this could be true of music performed for a specific ritual, or of various solos heard in different performances of the same Jazz standard. Though not identical, such experiences can be conceptualized as instances of the same category. In this manner, investigation of the category may grant insight into the music that was not possible in the moment of the experience.

Given their temporality, all musical experiences share the essential property of non-duplicability (Matheson ; Caplan 2011; Gadamer 2004). The sheer number of variables in a performance prohibits any two musical experiences from being the same. Were it physically possible to interchange them, something would be altered. As a consequence, anyone seeking to understand the meaning of a category or work — even musical materials such as rhythm and pitch — must contemplate them as existent through time, rather than as individual manifestations (Davies, S. 2011).

1.3 Writing Music
In Longfellow’s famous paean ‘Music is the universal language of mankind’ (1835:202).8 Much like language, every child learns about the music of their world, at least at the local level (Jackendoff 2011). Indeed, the human brain processes music much as it does language (Patel 2008). However, music may also exceed language in its ability to express that which cannot be spoken. In this way, it shares with the visual and performing arts, poetry and religion, a creative communion of the ineffable.9 This helps explain why there has never been a society devoid of music (Brown 1991; Savage et al. 2015; Egermann et al. (2015).10 Music is imperative.

Though music may exceed language in this respect, expression of its socio-historical significance still requires words. Because music is nonverbal, it is philosophically ‘an unknowable truth’, and therefore discourse about it falls within the domain of metaphysics (Blacking 1982:16). Writing is more advantageous than speaking for such philosophical discussion because it affords opportunities for detailed analysis in permanent form. Since writing can be developed over time and revised, it is especially conducive to thorough examination of musical works and categories, as well as experiences. Insomuch as writing boasts ample space for ideas to grow and play, it is uniquely capable not only of describing musical meaning but moreover of reconstituting it (Kramer 2003:128). Because music — at least that music which does not rely upon language11 — lacks clearly discernible external signification, it depends on this reconstitutive property (Bowman 1998; Nattiez 1990). This ratifies Bohlman’s assertion that musical meaning depends on context (1999).

Naturally, this dependence opens music to varied interpretations (Kania 2017). Thus, writing not only represents music (for it is more than designation), it re-presents12 it, making it available for present consideration. Writing intercedes on behalf of the music for the purpose of presenting fresh interpretations to the reader. By asserting these novel interpretations, writing effects a transformation of music that renders it newly meaningful (Gadamer 2004; Kramer 2003). In the case of analytical writing, this transformation may present the music to us in ways that cannot even be physically experienced by listeners (Zbikowski 1998). In thus making music present again, writing enables us to step beyond the temporality of experience to contemplate music as it exists through time.

This anamnestic13 quality, wherein writing re-presents music, is not unique to writings about music — similar arguments could be made for dance, drama or ritual. However, it is of unparalleled value for music description precisely because music is ‘fundamentally ineffable’ (Kramer 2012:101). ‘Music relies neither on linguistic order nor on physical context, but on organization that can be perceived in sound itself, without reference to context or to semantic conventions’ (Scruton 2016:5). As a result, writing is a potent resource for conferring meaning. It serves as a hermeneutic of music, whereby it may wield enormous influence over subsequent performance and reception. Indeed, successive applications of this hermeneutic have established it as a prominent re-creative act that defines our understanding of music.

1.4 Writings on Music
From ancient times, music has provoked description. Given its universality, it is easy to grasp why music should be an object of curiosity. Extant analytical writings from East to West, from Confucius to Plato,14 attest to this fascination. Indeed, multiple extant sources witness to the early Greeks’ interest in music.15 Those writings inaugurated a discourse16 on music in the Western hemisphere that endures to this day. Given its re-presentative character, writing offers an unusually dynamic means of interacting with music. In its most sophisticated forms, this interaction employs analytical description to seek an ever-deepening and transformative knowledge of music. Music possesses such tremendous worth because of its power to sound, to announce, whereas writing invests music with value of another kind by ordering its potency to a particular end. This is most emphatically true of analytical discourse wherein the purpose is to inform and persuade.

English writings about music can trace their ancestry back to the Greek philosopher Pythagoras (6th century BCE), who is credited with the discovery of the ratios of musical consonances (Barbera 2001). This tradition of theoretical study carried through late antiquity into the Medieval university curriculum. Therein it served as a branch of the quadrivium, a designation coined by the early 6th century Roman writer, Boethius, for the mathematical subjects: arithmetic, music, geometry, astronomy. Boethius himself wrote original reformulations of earlier music sources, contending that music was unique on account of its connection to ethics and reason (Bower 2001). Here already is an instance of writing re-presenting musical meaning.

Interestingly, the earliest music writings in English, dating from the 15th century, are pedagogical treatises on performance. Three English treatises appear, together with 17 others in Latin, in a late 15th century manuscript (Lansdowne MS. 763). The first and second treatises, by Leonel Power and an anonymous writer, respectively, are instructions for singers on counterpoint. These two are followed by a third on musical proportions by Chilston (Bent 2001). Together, these earliest English writings address questions of musical structure for the purpose of influencing performance practice. Thus from its inception, English music discourse has also sought that re-creative power over music that is peculiar to prose.

The succeeding century witnessed a flourishing of dialogic treatises on music throughout Europe, particularly in Italy. By the late 16th century, Thomas Morley had adapted this form in English to write an introduction about the Italian musical styles of his day (Judd 2008). This inaugurated a period in England, throughout the 17th and 18th centuries, in which scholarly and analytical writings on music came into their own. Roger North’s (1651-1734) late commentaries formed the basis of music historiography (Kassler 2001c). J.C. Pepusch (1667-1752), a musical antiquarian and German immigrant to England, penned books on harmony (Boyd et al. 2001). The philologist and musician, Charles Butler (c1560-1647), authored a manual for performers. Curiously, some of the musical rudiments Butler presents harken back to Boethius (Pruett ; Herissone 2001), and his volume remained known to the later music historian, Charles Burney. These texts sought to instruct performers, as well as to situate works in a historical context.

In the 18th century, James Grassineau (?1715-1767) compiled and edited A Musical Dictionary (1740). Though it borrowed heavily from other sources, Grassineau also included original material, making it the first lexicographical volume of its kind in English (Kassler 2001b). Charles Avison (1709-1770) was an influential music writer who contributed to performance practice and criticism (Stephens 2001). In 1763, John Brown (1715-1766) published a volume of music historiography to argue for the restoration of ‘superior taste’ (Kassler 2001a). A singular volume, Daniel Webb’s (1719-1798) Observations on the Correspondence between Poetry and Music (1769) defended his belief that the combination of music and poetry could influence the mind (Kassler 2001d), while James Beattie’s (1735-1803) Essays on Poetry and Music was dedicated to musical aesthetics (Johnson 2001). These last few volumes attest to a growing awareness of writing’s influence on the future production and reception of music.

In the latter half of the century, two giants came to dominate music writing in England. Charles Burney (1726-1814) was a larger-than-life musical personality in London society, where he maintained contacts with some of the most notable figures of his day, aristocrats and musicians alike. His voluminous General History of Music from the Earliest Ages to the Present is still valuable for musicologists today, and was appreciated in its own time for Burney’s ability to connect with his readers. Though some of the material covered in General History has been questioned in light of later scholarship, it remains an impressive addition to historical English writings on music, and is notable for its influence on public opinion (Grant 2001). Published contemporaneously to Burney’s history was John Hawkins’ (1719-1776) five-volume General History of the Science and Practice of Music. Printed in its complete form in 1776, it represents a major contribution to music history and antiquarianism in England (Scholes 2001). Both writers dedicated themselves to historical research in order to inform the musicians and audiences of their day.

The above writings seeded music as a social science in the field of the 19th century academy, thus inaugurating the modern era of musicological scholarship. Taken in its broadest sense, Musicology is ‘the scholarly study of music’ (Duckles et al. 2001). Among the most historically significant publications and persons during this period, mention must be made of John Sainsbury’s (c 1793-c 1862) Dictionary of Musicians (1824), the first such publication in English to cover international musicians (Langley 2001). Research into ancient music was promoted by Walter Howard Frere (1863-1938), who wrote musicological studies on Medieval music and its paleography (Berry 2001), while Edmund H. Fellowes’ (1870-1951) critical editions of Tudor music first introduced that repertoire to modern performers and audiences (Shaw 2001). Oscar G. T. Sonneck (1873-1928), one of the first American musicologists to make substantial contributions to the field, compiled a number of music catalogues and scores that form a sizable portion of the Library of Congress’ music collection. He also wrote about older music, particularly folk music of the United States (Newsom & Hitchcock 2001). Otto Kinkeldey (1878-1966), first chair of Musicology at an American university, Cornell, is revered as the founder of the discipline in the American academy (Grout & Davidson 2001). Lastly, Edward Dent (1876-1957) held a chair in music at Cambridge for fifteen years, from where he was able to influence a generation of musicians, most notably with his insistence that scholarship should aid and influence performance (Lewis & Fortune 2001). Visible among these exemplars is a shared belief that analytical writing can transform the performance and reception of music.

Likewise, this era also witnessed the burgeoning of encyclopedic volumes and musicology journals, notably: The Musical Times (since 1844), Grove’s Dictionary of Music and Musicians (completed 1890), Music ; Letters (founded 1920), and The Oxford History of Music (1901-1905) (Duckles et al. 2001). In addition, some of today’s leading music associations and societies, each with their own scholarly publication, began during these years, including: the Musical Association (founded 1874), renamed Royal Music Association (from 1944), and its journal Proceedings, renamed Journal of the Royal Music Association (from 1987); the American Musicological Society (founded 1934) and its Journal of the same name; Society for Ethnomusicology (founded 1954) and its journal, Ethnomusicology; Society for Music Theory (founded 1977) (Duckles 2001). These and several other journals, along with book publications, constitute the foundation of modern Musicology’s literature. Found therein are the following areas of scholarship, many of which can be recognized from the preceding historical sketch: historical method; theoretical and analytical method; textual scholarship; archival research; lexicography and terminology; organology and iconography; performing practice; aesthetics and criticism; sociomusicology; psychology, hearing; gender and sexual studies (Duckles et al. 2001). It should be noted that several other fields of study connected to music are not listed in this inventory, such as music education and therapy, as their theoretical and methodological concerns hue closer to disciplines other than Musicology.

1.5 The Problem
This vignette of English writings on music highlights the salient features of such music discourse through time; namely, it was intended to influence performance, was therefore both pedagogical and analytical in nature, and in time attained a mass of writings sufficient to coalesce a community of scholars about it. As a study grounded in Applied Linguistics, the present investigation focuses both on the community of music scholars who present their arguments through writing and the novices who aspire to such expertise. This ‘expert-novice relationship is born out of differential experience with and access to community practices’ (Vickers 2010: 116). Given its status as the ‘currency of intellect’ (Pratt 2011:ix), writing is both a vital practice and a critical skill which musicians are compelled to develop if they wish to participate in the expert discourse of the scholarly community. Despite this fact, no linguistically informed analysis of musicological writings has been undertaken, nor are musicologists typically able to offer a linguistically informed perspective of their own disciplinary writing, a situation common to academics in many areas (Gebhard et al. 2013).

There have been some few investigations into other discourses about music, though even these are limited in number.17 Similarly, musicologists have taken up questions about the intersections of music and language, but again these are not linguistic analyses.18 Perhaps this lacuna is due to the relative scarcity of linguistic research into the humanities, or perhaps it is an accidental byproduct of the copious research focused on IMRD (Introduction, Methods, Results, Discussion) structure writing, to which music discourse generally does not conform. Whatever the case, any variety of academic discourse requires a common ground, a standard, by which knowledge may be disseminated. Without such an understanding, academics risk missing or disregarding the musical trends that are an outgrowth and reflection of, as well as commentary on our world today. What is at stake is nothing short of understanding who we are.

In light of the universality of music, its tremendous social value, the power of writing to re-present and transform musical meaning, the disciplinary demand for such writing, the lack of linguistic research into this literature, the dearth of subject specialists’ knowledge regarding their own discourse, and the imperative to explore the world, an analysis of Music Discourse is urgently overdue. Therefore, this project is undertaken for music experts and novices alike, whose scholarly economy is dependent upon this common currency, and whose mandate is to know our world more abundantly.

1.6 Research Aims
Because there is no existing research into Music Discourse on which to draw, this project will follow Biber et al.’s ‘deliberately exploratory’ approach by mining two specialized corpora for frequent formulaic items, categorizing them (2004:376), and analyzing how they connect with the music content of the discourse. Since one of the greatest challenges for novice writers is the production of typical structure (Winberg et al. 2010), it is hoped that this approach will afford a top-down perspective of disciplinary epistemology through a bottom-up investigation of formulaicity.

As in all human endeavors, language is constituted of the conventions which govern it. Such governance is imposed through the act of using the language with a given community. For this reason, Peregrin describes language as ‘propriety’, in the sense that communities maintain agreed standards of correctness in language production (Peregrin (2012:210). Ultimately, this research aims to address the needs of the music community, as any subject expert may benefit from a modicum of linguistic training in order to assist novices (Kennedy 1983) with the production of whole texts.

Before proceeding, it will be useful to make a distinction about the subsequent use of the term Music Discourse. As the present research proposes to investigate discourse about music, rather than music as discourse24, the majuscule ‘M’ will be used throughout this research as an orthographic aid to refer specifically to the academic discipline of Music, and to disambiguate the same from other senses of that word, which may include, but are not limited to phenomena of music, such as performance or the experience of listening to music. As a further convenience, Music Discourse will herein specifically denote writing, both expert and novice, originating from any subfield within the discipline of Musicology.

2 LITERATURE REVIEW

KEYWORDS: Applied Linguistics, ELT, ESP, EAP, novice, expert, communicative competence, discourse community, formulaic language, lexical bundles

2.1. APPLIED LINGUISTICS
As a corpus-driven analysis of an academic discourse, the present research is a project in Applied Linguistics. Though definitions fluctuate somewhat,19 the essential facet of this discipline remains the move from practice to theory. This does not mean that applied linguists stir together a handful of problems, adding a seasonal theory, as salt to soup — no. On the contrary, Applied Linguistics is concerned with theoretically informed solutions to the problems of language in the real world (Simpson 2011). This is the sense of Brumfit’s well circulated definition: ‘The theoretical and empirical investigation of real-world problems in which language is a central issue’ (Brumfit 1995:27). It is noteworthy that his description introduces a chapter focused on education and teachers — a nexus of those ‘real-world problems’ to which Applied Linguistics dedicates its efforts, and the departure point for the present study.

2.2 From the General to the Specific: ELT, ESP, EAP
Following World War II, Britain and America promoted English Language Teaching (ELT) as a means of extending their global influence (Howatt & Widdowson 2004; Kaplan 2010). ELT is a more complex endeavor than its name alone might suggest. It encompasses both teaching and research activities intended to address the needs of English learners from any background (Hall 2016). Thus, the language requirements of any discipline fall within its purview.

As ELT spread internationally, another approach, English for Specific Purposes (ESP), began to grow in prominence. ESP is sometimes contrasted with general English, in that ESP learners study language not for the communicative demands of daily living but for those of a particular skill or domain of knowledge. ESP naturally concentrates on ‘authentic material…simply because of the orientation towards purpose’ (Carver 1983:133). From its inception in the early 1960s (Johns 2013), the field of ESP was embraced in numerous places (Hall 2016) and its theoretical scope enlarged through encounters with other theories of language (Starfield 2016). For instance, Halliday et al. proposed the study of register, drawn from Systemic Functional Linguistics, to address the particular problems of teaching professionals (1964). Subsequently, ESP research adopted register analysis in order to prioritize high frequency features in the teaching of ESP courses (Hutchinson & Waters 1987). This focus on recurrent features naturally attracted the attention of corpus linguists, because corpora have proved ‘extremely useful for ESP teachers in that they are able to show how language is used in the context of particular academic genres’ (Paltridge 2013:351).

Over the past half century, the proliferation of ESP specializations has ranged across medicine, business and science (Paltridge & Starfield, eds. 2014; Atkinson 1999). ESP remains a concern for applied linguists because it addresses the needs of practitioners. Indeed, it continues to benefit both students and teachers by enriching pedagogical materials, often by identifying subject-specific vocabulary (Basturkmen 2010), and also by helping language instructors learn content unique to various disciplines (Starfield 2016).

This focus on academic subjects prompted the emergence from ESP of another area, English for Academic Purposes (EAP). Drawing upon Flowerdew (2015c), Basturkmen and Wette thus define English for Academic Purposes: ‘The term EAP refers to the teaching of varieties of English to assist students of all ages to manage the linguistic, conceptual and social demands of academic study, as well as to support the dissemination and exchange of research and scholarship’, thus emphasizing that theory and research are essential compliments to EAP’s practical dimension (2016 PAGE ###???). As a result, EAP tends to focus on specific skills, such as the structure of written argumentation within a given context (Lee and Subtirelu 2015).

Because English continues to gain ground as the lingua franca of the academy (Mauranen 2003), and writing is integral to academic discourse (Pratt 2011), much EAP research and practice has focused on writing. The result has been an abundance of discoveries about the nature of academic writing. Some of the more prominently researched features include nominalization, use of passive voice, hedging and boosting (Carter ; McCarthy 2006). These findings have contributed to generalizations about the nature of certain disciplinary discourses; hence, scientific discourse is characterized by technicality, while philosophic discourse depends more on abstraction (Hyland 2012). Though Music Discourse has yet to be researched, Pérez-Sobrino and Julich found that verbal descriptions of music employ metaphor to convey meaning (2014).

2.2.1 EAP: from Novice to Expert
As English Language Teaching spread globally, its classrooms filled both with native and non-native English speakers. In time, however, these labels were problematized as vestiges of imperialism, reincarnated culturally (Canagarajah 1999; Phillipson 1992, 2012; Llurda 2016), and were replaced or interchanged with more neutral labels, such as L1 and L2 (Slabakova 2016). These designate a person’s first and second (or additional) languages, respectively (Johnson & Johnson 2001; Ortega 2011). Considering the challenges to anyone learning an academic subject, however, Solly notes that notions of native and non-native speakers is less relevant in EAP than might be the case in other areas of ELT (2016). Instead, a more useful distinction may be drawn from novices to experts. Vickers (2010) follows Jacoby and Ochs (1995), and Jacoby and Gonzalez (1991), in stating that the roles of novice and expert are bestowed upon persons through the ratification or rejection of their contributions to their academic community (Vickers 2010). In this sense, these roles are assumed or presumed through social mediation, thus offering a dynamic scale, imposed by the community, for the evaluation of a person’s work (ibid.). Writing often serves as evidence of the writer’s role, an identification badge of expert or novice, and publication acts as the merit by which the former may be recognized. This is not to say that all forms of publication confer equal merit, but only that acceptance and revision are prerequisites for dissemination to the community through print.

Because English for Academic Purposes focuses on the language needs of novices and promotes research intended to inform expert teaching practice, it often centers on the role of English in academic publication (Basturkmen & Wette 2016). Expert writings may be analyzed to establish criteria for judging a writer’s competence, and novice writings may then be compared against this standard to identify areas to be addressed in teaching. Within EAP, the notion of competence covers the expanse of linguistic knowledge necessary for acceptable communication within a given situation (Widdowson 1983; Bruce 2011). Moreover, the ability to integrate disciplinary knowledge with the structures of disciplinary writing ‘to process and create extended discourse is referred to as discourse competence’ (Bruce 2008:2). Thus, as expert status is recognized through publication (Pérez-Llantada 2014), and publication is a mark of professional achievement, perceptions of competence and professionalism are interwoven (Solly 2016).

As Widdowson’s qualification of ‘acceptable communication’ indicates, an individual’s competence is assessed by a community of language practitioners. Thus, ‘communicative competence is the knowledge a speaker must have to function as a member of a social group and is based on language use and socialization within cultures’ (Morgan 2014:37). Not addressed here, however, are questions of degree: how much knowledge and of what specificity, and what are the gradations of functionality? In other words, there surely is a spectrum of ability along which anyone may move. A novice’s place on that spectrum is judged sufficient or not by experts, who are often their teachers, based on transparent communication of meaning within communally accepted standards.

2.2.2 The Novice
Though Music Discourse is a product of the Musicology community, such writings may be directed toward both colleagues and performers. Academically, the distinction between Performance studies and Musicology is institutionalized at the postgraduate stratum, wherein Master of Music (MM) and Doctor of Musical Arts (DMA) degrees are often awarded for performance, while the Master of Arts (MA) and Ph.D. may be conferred for original research. In my own experience as a music student several years ago, all music majors were required to enroll in writing courses at the bachelor’s, master’s and doctoral levels, each of which was taught by a professor of music during the first year of the program. The explicit purpose of these courses was to train music students in the norms of disciplinary writing. Faigley and Hansen note that such writing courses, taught within a student’s major discipline, are a typical requirement of first-year university study (1985).

In the three writing courses I studied, instruction focused exclusively on research, citation and style. Regarding this last, novices were admonished to write with accuracy and clarity, which were judged by grammar and intuition, respectively. The required texts for these courses were guides to style and citation20. As a consequence, it became swiftly evident that neither structure nor writing technique would be taught. It also became apparent that the expert teachers were not prepared to give constructive feedback beyond correcting grammatical errors and streamlining punctuation. Among these professors, one was an established performer who wrote his own program notes and two were published academics. All three of were capable of producing engaging prose, and are not to be faulted for a lack of knowledge about the mechanics of academic writing, not even specifically for music. As Gebhard et al. note, teachers typically do not possess a detailed understanding of the disciplinary discourses they themselves read and write, and are therefore neither able to adequately indoctrinate students in those same discourses, nor able to demonstrate how such discourses forge disciplinary-specific meanings (2013). As a result, subject teachers tasked with writing instruction seldom specify expectations for language acquisition pertinent to their field, thus leaving novices to flounder (Hyland 2006).

Given that writing competence develops under the influence of disciplinary learning (North 2005), this is a particularly lamentable situation, as it means that students are not fully benefitting from their education. In particular, this represents a failure of the expert community to adopt new members as fully and efficiently as possible. Sinclair notes that academics often hold dear the notion that their formal discourses are tautly organized, yet they remain uncertain how to explain such organization, and thus leave this work to the linguist, disregarding the assumption that such a structure exists (2004b).

Several years after completing music studies, I inaugurated a second career in English Language Teaching (ELT) by moving to China.21 Being placed in the position of expert instructor afforded me a new appreciation for the challenges of academic writing, especially for novices writing in a second language. The majority of those students fit the profile, outlined by Gebhard et al., of second language learners who posses ‘the semiotic resources required to construct everyday meanings in a second language, but…struggle to construct discipline-specific meanings despite years of schooling’ (2013:107). Not only did this prove true for my students; it was also evident in the writings of professionals I met in other fields.

This observation turned my curiosity back to students of music and, when I returned to university as a master’s student in Applied Linguistics, I chose to research language needs for those learners. That research focused on International Baccalaureate (IB) Programmes22 in China, and found that the student participants overwhelming aspired to a greater ability to express their musical experiences and ideas.23 Meanwhile, their instructors — all expert musicians — bemoaned the state of their students’ writing abilities (Berg 2015). Throughout the research project, though, no writing textbooks were observed in classrooms or with students, nor was there any specific instruction about the structure of written discourse. At best, classrooms were outfitted with bilingual music dictionaries.24

While younger learners may be exposed to certain conventions of writing, they are increasingly unlikely to be schooled in more complex forms as they process through higher grades. At those levels, academic language is typically relegated to a position secondary to subject-differentiated courses (Gebhard et al. 2013). Naturally, teachers of those courses are subject experts. They are not expected to have specialized linguistic knowledge, yet both instructors and students could benefit greatly from a more detailed understanding of Music Discourse’s inner workings. Therefore, it is hoped that this first attempt to analyze Music Discourse will prove valuable to experts and novices.

2.2.3 Expert Expectations
The aforementioned music writing classes relied on style guides as textbooks. There are several such volumes available on the current market, including: Bellman (2006), Herbert (2009), Holoman (2014), Irvine and Radice (2003), Poultney (1995), and Wingell (2008). Interestingly, all six of these books include the word Writing in their titles, which speaks to the prestige of, and demand for disciplinary writing. As mentioned previously, these guides all agree on the need for accuracy and clarity, and attention to purpose. The following excerpt illustrates how such guides advise novice writers:

The basic points to emphasize about content, form, and style are:
Your writing must be fit for its purpose: its content, form, and style must serve the needs of your intended readership.
Irrespective of those aspects of style that distinguish your writing from that of others, your writing must be clear; it must be so clear that everyone who reads it takes from it only the meaning that you wish to convey.
What you write must be founded on a clear and accurate understanding of your subject (Herbert 2009:13-14).

While there is little here with which to argue — certainly these are valid counsel — there is also little here from which to learn. Similarly, Wingell’s well-known volume Writing about Music (2008, Fourth Edition) does little more than decry the state of undergraduate writing (not an entirely original complaint), admonish students to avoid subjectivity, and offer checklists for grammar, terminology and citation. There is irony in an exhortation to lucid prose, when the same authority offers little insight to the reader. Other guides offer even less, such as Holoman’s tepid suggestion to consult university writing websites for assistance producing purposeful prose (Holoman 2014).

Obviously, these books leave a great deal about Music Discourse to be rendered in finer detail. For example, what of linguistic structure above the sentence level? Presumably, these writers hold in esteem the overall communicative goals of such writing — they are musicians, after all — yet throughout these books there is barely any acknowledgement, beyond admonishments regarding topic sentences, of the structures that bring sentences together in larger units of meaning. Surely these writers assume the importance of writing competently, yet they do not discuss what specifically constitutes such competence. Bhatia divides competence into textual, generic and social dimensions, of which the first is demonstrated by texts that exhibit grammatical correctness, cohesion and coherence (Bhatia 2004). While the first is dear to these writers — to any serious writer, no doubt — these guides do not speak of the latter, nor do they address more pressing questions, such as ‘How is propositional content woven together in Music Discourse?’ This suggests that recent linguistic insights into the nature of academic writing have yet to infiltrate the music department, despite the fact that scholars continue to highlight the necessity of teaching to discipline-specific situations, rather than academic writing generally (Coffin & Hewings 2003; Zbikowski 2008; Durrant 2009:165). Precisely for this reason, Morton advocates teaching a functional approach to language, from the lexicogrammatical to the generic, as a means of inducting students into the culture of a given community (Morton 2010).34

2.2.4 Discourse Community
Bhatia’s tripartite division of competence recognizes the evaluative role played by the expert community (2004). This concern with the communal dimension of discourse draws upon the constellation of mid-20th century intellectual movements that formed the so-called social turn. This reorientation of philosophical thought, away from individual and behaviorist theories toward social constructions and relationships, situates discourse as a social phenomenon (Gee 1999). One of the leaders of this movement was Paul-Michel Foucault, whose magisterial contributions to the intersection of epistemology and language (Foucault 2000a, 2002b) proposed a paradigm of discourse as ‘language-in-action’ (Blommaert 2005:2). From this vantage, context is viewed as paramount to discourse because it is ‘the totality of conditions under which discourse is being produced, circulated and interpreted’ (ibid.:251). Much research has seized on this idea, such as Escudero’s study which finds that patterns of language use reflect institutional identity (2011), demonstrating that ‘a community of practice arises from prolonged, shared experience. The latter is true of academia generally, and of its constituent disciplines individually (English ; Marr 2015).

The notion of discourse community is one of the salient features of EAP, especially as it applies to written genres within an academic community (Bruce 2011). The community maintains ‘a threshold level of members’ who possess expert knowledge of content and discourse (Swales 1990:27), and who ‘own’ and ‘operate’ that discourse (Gee 1992:107). Since the Musicology community promulgates its identity largely through writing and publication, novices must master this discourse content, as well as the linguistic forms used to present it, to earn ‘credentials as an insider’ (Gee 2014b:147). Hyland confirms this requirement: ‘There is now compelling evidence across the academic spectrum that disciplines present characteristic and changing forms of communication which students must learn to master in order to succeed’ (Hyland 2006:3).

These expert forms must be elucidated in order to empower novices seeking admittance to the discourse community. The research required to accomplish this could benefit any music student, but may prove particularly beneficial to to those whose first language (L1) is not English. As the world’s most populous nation and its second largest economy,25 China is gaining an ever larger presence on the world stage. Musically, this stage presence manifests in a frenzy of piano and other instrumental practice, as students prepare for competitions and examinations.26 China is also sending ever more students abroad for study, and approximately half of those study in English-speaking countries (?????????? 2016).27 Though not all of these students study music, those who do will surely have an impact on music teaching in the relatively near future, both within and without China, and this gives every reason to prepare them for full participation in the global community of Music academics. The present study’s investigation into the structure of Music Discourse could facilitate these students’ bid to join the expert discourse community.

2.2.5 What needs teaching?
Having averred that Music Discourse should be investigated for the benefit of experts and the novices they instruct, and bearing in mind that this research is deliberately exploratory, three basic questions present themselves regarding the focus and process of this investigation. What linguistic items should be researched and why? How can such items be identified? Once found, how can these items serve to illuminate the structure of Music Discourse for the benefit of novices? Formulaic language has been chosen as a fruitful field of investigation in response to the first question. Corpus Linguistics provides a methodological solution to the second problem, while the final question will employ discourse analysis to answer how formulaicity interacts with propositional content to create Music Discourse.

2.2.6 Formulaic Language
Question: Why should formulaic language be researched?
As mentioned above, the expert authors of music writing guides advocate for grammatical correctness and clarity of purpose in disciplinary writing, yet the resources available for such instruction never touch upon the structures that bind sentences together as whole texts. In other words, they do not discuss how to move from grammar to purpose. No doubt this reflects an over-dependence on dictionaries and grammar textbooks as aids to writing, which would seem in keeping with the course requirements for Music Discourse study. What else, then, may be taught to help music novices deepen their understanding of writing? Though various candidates can be put forward, formulaic language has been chosen for this investigation on account of its ubiquity and ability to braid meaning throughout a text.

Why begin with formulaicity, and exactly what is it? The designation formulaic suggests the repetition of a sequence of items, taken as a unit. In the case of language, such formulaic sequences are any series of contiguous words that exist as a unit, and which seem to be stored and retrieved from memory as such (Wood 2015), which readily corresponds to the colloquial (and sometimes grammatical) sense of a phrase. Indeed, the terms formulaic language and phraseology are transpositions of one another found under different theoretical frameworks (ibid.).

Study of formulaic language may be traced to J.R. Firth’s work on collocation (Firth 1957; Léon 2006). Indeed, his ‘idiom principle’ — the notion of partially fixed phrases that function as single units within text — often serves as a point of departure for definitions of language formulae (Sinclair 1991:110). The recurrent nature of formulae has led to the general acceptance that ‘there is undoubtedly some sort of relationship between frequency and formulaicity, both in the sense that some formulaic sequences are very frequent, and that formulaic output is frequently called upon’ (Wray ; Perkins 2000: 6-7). Whereas searching for instances of formulaicity was laborious study in Firth’s day, the advent of computing has greatly expedited the process, permitting rapid searches of large corpora of text. This area of linguistic investigation, Corpus Linguistics, will be the subject of Section 2.3.

By now, the sheer volume of research into formulaic language has generated a detailed image of its constituent parts. Language formulae exist across registers and domains, and generally are ‘1. Multi word, 2. Have a single meaning or function 3. and are prefabricated or stored and retrieved mentally as if a single word’ (Wood 2015:3). Indeed, so ubiquitous and well-studied are these formulae, that they have come to be regarded simply as multi-word lexical units (Wray 2008). The facts that formulae are pervasive, ‘are not amenable to lexical and structural re-formulations’ and ‘tend to occur in particular styles of language tied to particular communicative situations’ (Corrigan et al. 2009:XIV), has enabled the realization that they are an essential ingredient for achieving language fluency; this further validates the study of both their function and context (Liu & Nelson 2016).

Similarly, the value of formulaicity in academic writing has been demonstrated by numerous studies (e.g. Hyland 2008a; Staples et al. 2013; Kashiha & Heng 2014; Pérez-Llantada 2014; Alhassan & Wood 2015; Peters & Pauwels 2015), many of which have provided lists of the formulae found in such writing (e.g. Biber et al. 1999; Biber et al. 2004; Simpson-Vlach & Ellis 2010). As Pérez-Llantada observes, ‘formulaicity is a key feature of the academic written register across language variables’ (2014:92). Because of their centrality to academic writing, Wray argues that formulaic sequences promote survival within an academic community (2008). Wood further underscores the importance of formulae to academic discourse: ‘Nowhere is the particular nature of academic writing more apparent than in its phraseology and use of formulaic sequences’ and ‘Formulaic sequences are, in essence, a major part of the foundation of successful academic writing skills because they comprise the basic elements of academic discourse and are specific to particular disciplines, registers, and genres’ (Wood 2015:103).

As the understanding of formulae in academic writing deepens, it is increasingly apparent that they are vital for achieving competence in disciplinary writing (AlHassan & Wood 2015). This in turn has created awareness of their pedagogical value. In fact, formulaic sequences present a significant challenge to learners, hence the interest in them among EAP researchers (Hiltunen & Mäkinen 2014). Thus, one study by Hiltunen and Mäkinen investigated differences between student writing and published writings in the domain of business and economics (2014), using the Academic Formulas List (AFL) of Simpson-Vlach and Ellis (2010). Having mined their research corpus for formulaic sequences, they then analyzed individual student writing samples for examples of usage. Comparing usage frequencies between experts and novices suggests areas to be addressed pedagogically. Just such a comparison could yield valuable insights into the structure of Music Discourse, the expectations of expert writers, and the ways in which novices can improve their writing skills. Given the sheer number of formulaic sequences, Hatami advises that those sequences most typical of a given discipline be carefully prioritized for teaching (Hatami 2015).

2.2.7 Lexical Bundles
The preceding discussion has sought to expound on formulaic language by defining it, noting its pervasiveness and frequency in academic writing, and by discussing its potential value for disciplinary-specific teaching. To address this last point, the present study proposes to build two corpora (one expert and one novice) for the investigation of lexical bundles in Music Discourse. Lexical bundles (LB) represent one form of phraseology, or formulaic sequence. Altenberg (1993, 1998) originated the investigation of lexical bundles, concentrating both on frequency and function. Comprehensive research into the form and nature of lexical bundles (LB) was published by Biber et al. in the Longman grammar of spoken and written English (1999). Therein, lexical bundles are defined as ‘recurrent expressions, regardless of their idiomaticity, and regardless of their structural status’ (Biber et al. 1999:990). According to Biber et al., they offer a particularly useful view of the structural and semantic content of a given register (2004), such as academic discourse, because they ‘have been shown to be discipline-bound, with each discipline or academic community having its own unique recurrent word-combinations’ (Qin 2014:230). From recent research (see Nesselhauf 2003), it is possible to deduce that lexical bundles may be advantageous instructional aids if they are taught with the content words that often surround them.

Chen and Baker note a variety of terms used to designate such word co-occurrences, including ‘clusters (Hyland 2008a); Schmitt et al. 2004); Scott 2017, WordSmith Tools), recurrent word combinations (Altenberg 1998, 2001); De Cock 1998), phrasicon (De Cock et al. 1998), n-grams (Stubbs 2007c; Anthony 2018, AntConc), lexical bundles (Biber ; Barbieri 2007; Cortes 2002)’ (Chen and Baker 2010:30). To qualify as a lexical bundle, a contiguous sequence must both be frequent and have a single meaning or function (Biber et al. 1999). Lexical bundles are dually categorized according to means of identification and function. While they occur across disciplines, the combination of relative frequency and type serve as a marker or characteristic of a particular discipline (Wood 2015).

These strings of contiguous words (two or more), which were identified solely by frequency across a given register, are further described in the Longman Grammar according to a list of characteristics. Among those observations, several are pertinent to an analysis of written Music Discourse, namely: three-word bundles out number four-word ones by a factor of 10 (ibid.:993); bundles in academic writing typically consist of noun and prepositional phrases, and end in function words, and therefore tend to be nominal rather than clausal (ibid.:1,000); when bundles cross structural boundaries, such as clauses, the following slot often contains the content specific to a given situation (ibid.:995, 999); bundles often overlap each other, forming longer sequences of formulaic text (ibid.:999). Considering that three-word bundles are far more frequent than the next highest number bundles, it seems possible that these smaller bundles will display more content specific to a given situation.

2.2.8 Taxonomy of Lexical Bundles
Though it can be problematic to create a taxonomy for formulaic language, Simpson-Vlach and Ellis maintain that it is crucial for pedagogical purposes. They also urge that these formulae be studied in context, as doing so sometimes expands a given taxonomic category (Simpson-Vlach & Ellis 2010). Howarth put forward one of the earlier taxonomies for phraseology (1998), but this has been replaced by more recent work, particularly that of Biber et al. the following year. While the Longman Grammar grouped lexical bundles in academic prose into 12 structural categories (Biber et al. 1999:1014-1015), the later study of Biber et al. offers three structural types for bundles, incorporating fragments of a verb phrases (e.g. this is a), dependent clauses (e.g. that there is), and noun and/or prepositional phrases (e.g. one of the, at the end of) (2004:384). These structural types were then used to propose a provisional tripartite taxonomy by function.

The three functions for lexical bundles are: stance expressions, discourse organizers, referential bundles (ibid.). Biber et al. define each of these in turn emphasis added: ‘Stance bundles provide a frame for the interpretation of the following proposition, conveying two major kinds of meaning: epistemic and attitude/modality’ (ibid.:389). They may be either personal or impersonal. ‘Discourse organizing bundles serve two major functions: topic introduction/focus and topic elaboration/clarification’ (ibid.:391). ‘Referential bundles generally identify an entity or single out some particular attribute of an entity as especially important,’ and are common in academic prose (ibid.:393). Furthermore, there is a consistent correlation between structural and functional dimensions in frequent bundles that alludes to an interweaving of structure, function, register and form in the formation of a given discourse; thus, evidence continues to mount suggesting that lexical bundles are ‘a basic linguistic construct’ which is heavily dependent upon the context of situation (ibid.:398).

Though alternate taxonomies have been proposed — Hyland predicates his on Halliday’s metafunctions (2008a:49), while Cunningham coins his own tripartite categorization: aboutness, coherence, variable level discourse (2017:76) — Biber et al.’s is adopted for the present research on account of its clear tripartite division, its attention to referential expressions in academic discourse, and because it is established among researchers (Wood 2015; Wray 2008, 2012).

Another advantage to this taxonomy, which is particularly relevant to the present study, is that referential expressions can further be divided into sub-categories that include designations for ‘identification/focus’, ‘specification of attributes’, and ‘time/place/text reference’ (2004:393-394). Each of these indicates a different function by which referential bundles may introduce propositional content. ‘Propositions are the linguistic meanings of the sentences we utter or write down…they are the contents of our thoughts’ (Crawford (2006:162). This offers a point of access for a fine-grained analysis of lexical bundles as nodes linking propositional content throughout Music Discourse. Bundles ‘can be regarded as structural ‘frames’ followed by a ‘slot’. The frame functions as a kind of discourse anchor for the ‘new’ information in the slot, telling the reader how to interpret that information’ (Biber et al. 2004:399). Such slots will most likely contain the semantic content necessary to reinterpret bundles within a disciplinary context, and given the fact that bundles can cross structural boundaries, it is reasonable to consider the information preceding such bundles as another slot. Thus, not only do bundles direct an interpretation of information surrounding them, but that same information may reframe the function of the bundles.

However, simply generating lists of formulas, even with functional taxonomies, is insufficient to help either teachers or students; research into applicability is very much needed — ‘formula in context is what is pedagogically relevant’ (Simpson-Vlach & Ellis 2010:502). Hence for Music Discourse, it would be fruitful to investigate how these sequences interweave with disciplinary content to construct knowledge. Along those lines, Hunston and Francis surmise that formulae may also be said to form patterns in texts (2000), yet they do not explore the idea further. However, it may be precisely these intersections of formulae and content that provide the vital ‘missing’ link between grammar and purpose in the teaching of Music Discourse. This idea is supported by AlHassan and Wood: ‘The view of academic discourse as replete with formulaic sequences implies that academic writing skills surpass the mastery of lexicon and syntax to encompass the successful implementation of these sequences that are viewed as the building blocks of academic discourse’ (2015:52).

Learning to deploy formulae successfully presents challenges for all novice writers. Wray comments that novices sometimes write the same bundles as professionals, yet employ them in unusual or inappropriate ways (Wray & Perkins 2000). The challenge seemingly looms larger when English is the writer’s second language (L2) (Yuldashev et al. 2013). Still, research has demonstrated that L2 students learn language in retrievable chunks from an early age, thus suggesting the value of teaching lexical bundles to novices (Ellis 2012). Moreover, it may be that novices’ interest in discipline-specific content would improve their acquisition and production of bundles in written discourse. Indeed, recent studies have questioned previously accepted wisdom that generic skills, such as forms of argumentation, are transferable across disciplines. Instead, it is increasingly suggested that such skills can only be effectively developed within the proper disciplinary context (Swales 1990; Gimenez 2011). As previously noted, however, discipline instructors typically do not possess expert linguistic knowledge of their own discourse.30

KEYWORDS: Corpus Linguistics, corpus-driven, representativeness, sampling, balance, frequency cut-offs and dispersion

2.3 Corpus Linguistics
Question: How are lexical bundles identified?
‘Corpus linguistics demonstrates that much of communication makes use of formulaic sequences’ (O’Donnel et al. 2015:83-84). In the present study, corpus methodologies will be used to investigate formulaicity in Music Discourse, but first a definition is needed: Corpus linguistics (CL) ‘is an area which focuses upon a set of procedures, or methods, for studying language’ (McEnery ; Hardie 2012:1). However, corpus linguists are not in universal agreement as to the precise definition of the field. Thus, an explanation that encompasses all activity deemed part of CL is difficult to formulate. In general, such a description would note that, in Corpus Linguistics, machine-readable texts are compiled according to criteria in order to address specific research questions (McEnery ; Hardie 2012). These criteria are reflected in Conrad’s definition: ‘A corpus is a large, principled collection of naturally-occurring texts that is stored in electronic form (accessible on computer)’ (Conrad 2002:76).

The most pertinent criterion here is that corpora consist exclusively of naturally occurring, or non-invented text. They are thus built for a variety of practical ends, such as investigation into varieties of academic discourse, or comparison of expert and novice usage. To ensure applicability for a given research problem, corpora are constructed within the limits of a sampling frame, consisting of multiple orientations to a research question(s), and are designed for balance and representativeness, as achieved through collection of diverse sources. Consequently, a corpus should never be assembled without first clearly defining the questions it is expected to address (McEnery 2004). With such clear criteria, corpus linguistics boasts several advantages. As an empirical methodology, corpus affords a reduction in researcher bias (Baker 2006). Its reliance on computers renders it more efficient and accurate than human computation, and also offers a convenient means of replicating research (McEnery et al. 2006). Teubert further argues that CL is uniquely positioned to give insights into meaning because it deals in natural language, itself an outgrowth of shared experience (2004).

The discovery of lexical bundles, their pervasiveness and varying function throughout different forms of language, witnesses to the validity of Teubert’s claim. Wood concurs, adding: ‘The remarkable and paradigm-shifting effects of the discovery of lexical bundles have uncovered the internal working of academic discourse, providing us with an observable and tangible element of language which is woven deeply into the fabric of discourse’ (2015:165). No doubt, lexical bundles have gained prominence in language research because their study employs an uncomplicated methodology coupled with wide agreement as to their importance in language learning (O’Donnel et al. 2015).

As lexical bundles are inherently recurrent, they are mined from corpora by frequency. The elegant simplicity of this method supports Altenberg and Granger’s rationale for employing corpus-driven approaches. They contend that much information pertinent to language teaching can still be gleaned by employing simple corpus tools (e.g. lemmatizers, concordances and formulae) because these open a window onto students’ strengths and weaknesses alike (Altenberg & Granger 2001). No doubt this helps explain Flowerdew’s observation that corpus methodologies are enjoying increasing popularity in discourse analyses, as they may be employed to search the content of texts from a given context (2016). The suggestion to use simple tools to investigate content forms the methodological point of departure for this study.

2.3.1 Corpus-Driven vs. Corpus-Based
As mentioned above, not all corpus linguists agree as to the definition of the field, not even as to its theoretical stance. For instance, McEnery et al. note that a fundamental distinction exists between corpus-driven and corpus-based methodologies, which is essentially a difference in the stated degree of commitment to empiricism (2006). Thus, in corpus-driven studies the corpus data is not tagged, leading to the claim that findings mined from such corpora reflect largely objective procedures. To that end, Sinclair advocates that non-tagged corpora limit themselves to the text with punctuation (2005). In this manner, a corpus-driven study is both data-driven and largely automated (Ädel ; Erman 2012). In the case of corpus-based studies, texts in the corpus are typically tagged according to various schema. Though this facilitates types of investigations not possible with a non-tagged corpus, it also opens the data to accusations of researcher bias, as judgments must be made regarding the use of tags (ibid.). In light of this objection, and of the fact that tagging increases the complexity of research, frequency is taken as the prime measure of significance for the present investigation of lexical bundles in Music Discourse (Pérez-Llantada 2014).

Given its multiple assets, a word about the limitations of Corpus Linguistics are in order. Though CL is able to explain various aspects of language, however, it is not in possession of an independent explanation of language (McEnery et al. 2006). It is therefore understood herein as a method rather than a theory. Furthermore, because corpus is empirical and inductive, it must hedge its claims about any language system with the caveat that it may speak confidently only of that data which it has directly investigated. Thus, CL offers a view of language behavior under particular conditions, which in turn permits an opportunity for inference of a larger reality (Stubbs 2007a). Such an inference is best fortified when all samples of a given discourse are theoretically available for consideration, even though practical constraints may preclude the inclusion of all samples in the corpus.

2.3.2 History of Corpus Linguistics
The earliest corpus studies were undertaken at the close of the 19th century but were limited in scope by the necessity of conducting the work manually (Baker 2006). By the mid-20th century, CL was becoming an established field. It grew contemporaneously to Noam Chomsky’s work on linguistics, but unlike Chomsky, corpus focused on naturally occurring language. This empirical approach begin to gain prominence in English with Randolph Quirk’s ‘Survey of English Usage’, a project begun in the 1950s, though not computerized until some three decades later (Teubert 2004).

As CL expanded, it developed types of corpora. General corpora typically include a range of genres and domains in order to offer a balanced view of a given language, and are usually large. A specialized corpus, however, is most often smaller and is constructed to represent a sub-language, either by domain or genre. Learner corpora focus on language acquisition, either longitudinally or in cross section, and often for L2 learners, or those students for whom English is a second language (Granger et al. 2015; McEnery et al. 2006), and they offer an empirical basis for language teaching, illuminating aspects of language that require specific attention (Granger 2009).

Because corpora can be built to specific research concerns, ‘corpus-based methodologies lend themselves well to answering the questions relevant to disciplinary specificity’ (Friginal ; Hardy 2014:26), and thus specialized corpora are inherently well-suited to discourse analysis (Biber, Connor ; Upton 2007). Meunier illustrates this by noting how corpus studies have revealed the lexical, grammatical, syntactical and discoursal fingerprints among various forms of writing (2002). As a project focused on one discipline, that of music, this study proposes to construct two specialized corpora, one of expert writings in the form of academic journal articles, the other of novice essays, to facilitate the analysis of Music Discourse for the benefit of music students. A bottom-up approach will enable investigation of occurrences of particular features throughout the texts in each corpus (Flowerdew 2014; Conrad 2002) in order to highlight the complexity of seemingly simple patterns (Altenberg ; Granger 2001).

While the study of learner data dates to the late 1960s, the construction of learner corpora only began in the 1990s, yet now boasts several prominent examples, including the Longman Learner Corpus (LLC) and Cambridge University Press (CUP) Learner Corpus, and the International Corpus of Learner English (ICLE). Still, teaching materials remain largely unaffected by such research, with the exception of some dictionaries. Such corpus studies can help to determine what language features should be taught, such as formulaic sequences (Nesselhauf 2004b), as proposed in this investigation.

2.3.3 Representativeness
Biber generally defines representativeness as ‘the extent to which a sample includes the full range of variability in a population’ (Biber 1993:243). In order to focus on a specific discourse, rigorous selection methods must guide the design of the corpus employed in the discourse analysis (Graesser et al. 2003). Thus, Teubert and ?ermáková state: ‘we are only justified in claiming that a given corpus is representative of a discourse, however we have defined it, if we have, at least in principle, access to all the texts the discourse consists of’ (2004:117). While certainly principled, such a consideration normally exists only in principle given the impracticality of collecting all possible texts. Biber also stresses the need to define situational and linguistic variables when building a specialized corpus. This entails consideration of the text types to be included, and their distribution (number of tokens and texts) within the corpus (Biber 1993).

Since no corpus can provide an exhaustive picture of a language, Sinclair urges care be taken to prioritize texts that reflect the communicative functions of the discourse in order to achieve maximum representativeness (2005). To this end, it is necessary to consider the texts produced and consumed by a discourse community, as well as their relative distribution. Sinclair offers the following criteria: choose and apply structural criteria to the corpus framework; decide which text types are available, which should be included, and prioritize them; estimate the size, quantity and importance of texts to be gathered (Sinclair 2005:7).

For a corpus of writing, there are so many variables related to representativeness that this can be a difficult question to address fully (Nelson 2010). Again, Sinclair offers a list of criteria for building a corpus, including the text’s mode (e.g. writing), type (e.g. journal article), domain (e.g. academia), language (English), location (e.g. originated in U.S.), and date (Sinclair 2005:7). This difficulty can further be addressed when building small corpora — one million words is small — by focusing on a single context that will permit extensive research of, and insight into that context (Flowerdew 2004). In fact, a specialized corpus may be so focused as to reflect a particular context, text type, subject and variety of English (Koester 2010:68). Even within such limits, however, it is still critical that the corpus be representative.

2.3.4 Sampling
The purpose of sampling is to ensure that a corpus is representative of a population. A population can be delimited in terms of language production, reception, or the products of language (McEnery et al. 2006). The populations of the two corpora in the present study both consist of prose texts written and submitted for the approval of specific discourse communities: professional music editors, in the case of expert writers, and music teachers, in the case of novice writers. The list of items to be included in the sample is termed a sampling frame, and it must be constructed thoughtfully and defensibly. Typically, samples are chosen so as to maintain balance among the various text types within a corpus. Indeed, samples are often determined by range of genres or portions thereof. Thus McEnery et al. advise ‘we generally agree with Sinclair (1995) when he says that the texts or parts of texts to be included in a corpus should be selected according to external criteria so that their linguistic characteristics are, initially at least, independent of the selection process’ (2006:14).

2.3.5 Balance
Balance may generically be taken to mean equal amounts of text from different kinds of sources (Hunston 2002). As position within a single text is known to influence the significance of an item, it is best to select complete texts, though not necessarily essential to consider texts of nearly equal length. Otherwise, individual texts may exert undue influence over the whole (Sinclair 2005). As the proposes sources for this study are restricted to one type per corpus — journal articles for the expert corpus and novice essays for the other — the one outstanding question of balance will be the amount of text per item. As IB Extended Essays have a prescribed limit of 4,000 words, the only consideration will be the word counts for expert journal articles.

2.3.6 Frequency
Starting from Firth in the 1950s (1957, cited in Sinclair et al. 2004), identification of formulaicity has relied on formulaicity (Wood 2015). Corpus frequencies are often used to determine which linguistic items should be incorporated into teaching materials (Frankenberg-Garcia 2016), and these items may range from individual words to formulaic sequences. A variety of extraction measures for these sequences may be combined for greater accuracy of identification; however, not all possible measures should be simultaneously employed, as some will prove unsuited to the corpus under investigation, while others will simply prove redundant (Antoch et al. 2013). As a result, measures relevant to a large general corpus may not be as well suited to a small specialized one.

2.3.7 Cut-offs and Dispersion
When using frequency to mine for lexical bundles, a cut-off must be chosen for the number of occurrences in the corpus, and for dispersion, or the number of texts in which a bundle appears across the corpus (Biber et al. 1999). These numbers are set arbitrarily for the purpose of delimiting the sheer volume of data to be analyzed, and may thus be adjusted according to a study’s proposed scope (Wray ; Perkins 2000). Numbers for frequency are N-times per million, typically 20-25 times per million, while those for dispersion set a minimum number of texts in which a lexical bundle must occur to be counted, often three to five texts (Hunston 2002; McEnery 2004; Sinclair 2005; McEnery et al. 2006; McEnery ; Hardie 2012; Crawford ; Csomay 2016). Furthermore, normalized frequencies can be applied to large written corpora, often 20-40 instances per million words, while raw cut-offs may be used for smaller corpora of spoken text. The normalized frequency chosen may also be calculated so as to equally affect and represent each corpus under consideration. Thus, in their study, Chen and Baker chose a minimum frequency of 25 times per million words, and followed Biber et al. (1999) in setting a dispersion cut-off between three and five texts for four-word bundles (2010). They took a length of four words as the typical unit of research because they are common in writing, of a manageable length, and often contain bundles of fewer items (ibid. 2010:32).

Though many studies do focus on four-word bundles, often in order to manage the size of the data, three-word bundles are so plentiful as to indicate that they are yet more foundational to a given discourse. Indeed, Biber et al. observed an inverse proportion between increasing factors of ‘n’ and decreasing orders of magnitude of n-gram tokens; thus, three-word bundles are more common in academic prose than four-word bundles by at least a factor of 10 (1999:994). This invites speculation regarding their frequency, function and content. What does such remarkably high frequency reveal about the ways in which these bundles interlace with surrounding content in a particular discourse? Although frequency may indicate their weight within the discourse, it does not explain what these bundles demonstrate about a specific discourse. Given their shorter length, it seems reasonable to wonder if they are more semantically tethered to a given academic discipline than longer bundles. For example, how might their content reflect disciplinary concerns in Music Discourse? Additionally, gaining perspective on such a volume of data would necessitate the use of some taxonomy, but do three-word bundles fit into such present systems? Would these bundles require the extension or retooling of these taxonomies? After all, it is these functions that characterize the interaction of bundles with surrounding text, which in turn lends them distinctive roles in various discourse; as a consequence, the content of the surrounding context reinterprets these bundles according to a given situation. This work calls for analysis of the discourse, which is the subject of the Section 2.4.

2.3.8 Corpus Software
The present investigation makes uses of AntConc (Anthony 2018), the free concordancing and text analysis software developed by Laurence Anthony, Ph.D., currently a professor in the Faculty of Science and Engineering at Waseda University, Japan. This software permits searches for n-grams (i.e. lexical bundles), with cut-offs for frequency and range, and a file view function that permits researchers to see the relative location of the same in each given text within the corpus (ibid.). Additionally, Anthony’s free software, AntFileConverter, is employed for the conversion of PDF and Word (DOCX) to plain text files (Anthony 2017).

KEYWORDS: Discourse Analysis, cohesion, coherence, ideational metafunction, textual metafunction

2.4 Discourse Analysis
Question: How can lexical bundles illuminate the structure of Music Discourse?

Having chosen lexical bundles as the structural unit for analysis, this study will examine their role in the structure of Music Discourse. To that end, methodology will be triangulated using corpus frequencies and discourse analysis to study how the most recurrent bundles interact with propositional content throughout complete texts.

2.4.1 Definitions
Broadly speaking, discourse analysis is the investigation of ‘language-in-use’ (Brown & Yule 2000). Adding texture to this illustration of discourse as situated (Lillis & McKinney 2003), Fetzer draws attention to its structural relationships, stating: ‘Discourse is fundamentally concerned with the nature of the connectedness between parts and wholes’ (2012:454). Zellig Harris originated the term discourse analysis (1952, ‘Discourse Analysis: A Sample Text’, as given in Todd 2016). Presently, Discourse Analysis (DA) blankets a range of methodological approaches, largely separable into three categories: the study of language use, the study of linguistic structure larger than a sentence, and the study of social practices (Tannen et al. 2015). The second of these, text analysis, is occupied with the fine-grained investigation of relationships within a text (Sanders & Sanders 2006), such as coreference across spans of text (Grimshaw 2003). Other relationships of interest for text analysis, particularly in the present study, include context and content, as well as cohesion and coherence (Graesser et a. 2003:2). Regarding the former, content analysis enjoys the distinction of ‘longest established method of text analysis’, it is also now difficult to define as it has ranged widely across concerns and categories. As a result, content analysis has outgrown any single definition to encompass multiple research strategies rather than specific methodologies (Titscher ??? 2000/2007:55 NEEDS CORRECTION). As for the latter pair of cohesion and coherence, Systemic Functional Grammar furnishes an ample theoretical resource for study of these relations, and has produced substantial scholarship about them. Primary examples include the discussion of cohesion and coherence in Halliday and Matthiessen (2014) and the treatment of cohesion by Halliday and Hasan (2013). Thompson gives the following explanation:

Cohesion refers to the linguistic devices by which the speaker can signal the experiential and interpersonal coherence of the text — and is thus a textual phenomenon — we can point to features of the text which serve a cohesive function. Coherence, on the other hand, is in the mind of the writer and reader: it is a mental phenomenon and cannot be identified or quantified in the same way as cohesion (2014:215).

Of additional interest for analysis of textual relationships are the six systems of meaning that Rose and Martin list as constituent of discourse: periodicity, conjunction, identification, ideation, appraisal, negotiation (Rose & Martin 2012:270). Considering that lexical bundles A) often overlap phrase boundaries, B) are most commonly referential in academic writing, and C) may contain disciplinary-specific content (most potentially in shorter bundles), the first our systems listed by Rose and Martin are of greatest relevance for the study of lexical bundles in Music Discourse. Ideation and conjunction belong to Halliday’s ideational metafunction, and periodicity and identification to his textual metafunction (Halliday ; Matthiessen (2014). Both of these metafunctions are central to the analysis of lexical bundles as nodes that link propositional content. As Pérez-Llantada writes, ‘Bundles bridge structural units in the discourse, framing semantic meanings’ (Pérez-Llantada 2014:86). The component systems of these metafunctions are defined by Rose and Martin: Periodicity considers how information flows from end points at any level of text to successive departure points; conjunction is concerned with logical relationships; identification refers to lexis that identifies and tracks nominal content through a text; and ideation entails lexical relations, those words that express meanings for processes, people, things, places and qualities (2012:270).

2.4.2 Analyzing Expert and Novice Discourse
The existence of these multiple relationships presents a complex textual web for investigation. Fetzer therefore advises conducting analysis from both the macro and micro levels to simultaneously accommodate quantitative and qualitative dimensions (2012). This accords with Martin’s advice to ‘shunt around’ between levels of language when analyzing text (2012/2009: (2012/2009:335). In this study, mining two specialized corpora for lexical bundles represents the micro level, while the analysis of how those bundles interact with propositional content to form Music Discourse represents the macro. Herein, the quantitative dimension, embodied in the corpus portion of this project, is understood as the first stage of the discourse analysis. Since lexical bundles may only be frequent on account of constituent parts that are frequent (e.g. grammatical words), however, this phase cannot stand on its own, such cases not contributing substantially to an understanding of the discourse (O’Donnel et al. 2015). Nevertheless, the use of corpora to conduct a discourse analysis accords with Biber et al.’s statement that ‘Corpus linguistic studies are generally considered to be a type of discourse analysis because they describe the use of linguistic forms in context’ (2007:2). As noted previously, the second stage of this analysis will be conducted by scrutinizing selected whole texts. This triangulation of methodology is designed to generate as comprehensive a view of Music Discourse as possible. Still, Martin offers a crucial reminder that a given analysis ‘cannot presume to have exhausted the meaning of the discourse’ (Martin 2012/2009:357).

The preceding analytical concerns all address questions germane to the study of both expert and novice text. Of special concern for the latter, however, are questions related to competence. As discussed in Section 2.2.1, competence is judged on the continuum from novice to expert, often in response to perceptions of typicality. For instance, novices may have retained a store of lexical bundles, yet still be unable to deploy them in ways typical of, and acceptable to, a given academic discipline’s community (Barton 1993). Ädel and Erman reiterate the point that competence may be achieved by learning the conventional deployment of formulae specific to a particular disciplinary register (2012). Furthermore, various studies have addressed this question. Investigations comparing L1 and L2 learners, for example, have found that both groups’ overall knowledge of bundles tends to be somewhat restricted to discourse-markers, to the neglect of referential bundles (Nekrasova 2009). This clearly poses a difficulty for novices, as referential bundles are known to be common in academic writing (Biber et al. 1999). Likewise, it would be useful for music students to know which bundles are underused in novice writing. Hyland also stresses the need for additional exploration of how bundles function differently from expert to learner writings (2008), which again touches upon the question of typicality. In fact, Chen and Baker’s research demonstrated significant differences in the use of lexical bundles between academics and students (2010), which predicts a similar situation may be found between expert and novice musicologists. All of this intensifies Bestgen’s appeal to replicate research into formulaic language on various corpora to continue yielding vital information for the teaching of writing (Bestgen 2017).

3 RESEARCH QUESTIONS

As this study is exploratory in nature, the following research questions are posed:
Which lexical bundles are constituents of expert and novice Music Discourse? Are any of these semantically specific to music?
How do these constituent bundles interact with propositional content in the discourse to realize the disciplinary concerns of Musicology?
How competently do novices deploy these bundles to realize ideational and textual metafunctions in their writing?