3 Material and method

The decision to construct a corpus specially for this thesis was made when it became clear that the use of available digital corpora was not feasible; the only large and relatively contemporary corpus found that allowed searches specifically for punctuation in newspaper headlines was the British National Corpus (BNC), which only contains British English. Furthermore , a preliminary search of the BNC was performed, but did not yield a single example of this form of asyndetic coordination in newspaper headlines.

The corpus on which this thesis is based was built by examining 8,821 headlines from the online editions of 12 different English-language newspapers from 4 countries (the US, the UK, Canada and Australia). The initial aim was to search through headlines until 50 headlines containing coordinated elements in accordance with the above stated definitions had been found for each newspaper. This approach was successful in all cases except Australia, which is dealt with further in section 3.4.

Apart from noting whether the elements are coordinated by a comma, an and or an ampersand (referred to below as comma form, and form and &-form) the headlines have also been categorized according to the phrase type of the coordinated elements: Noun phrase, verb phrase, adjective phrase or two different phrases. A more detailed description of the corpus is found in section 3.5.

3.1 Online newspapers

Focusing on online newspapers was a matter of convenience, since they are easily available. All headlines have been culled from the official RSS feeds of the various newspapers, which turned out to be the only effective way to search through the amount of headlines needed. An RSS feed is a very common type of web feed used to advertise frequently updated works, such as blogs or newspapers. They typically include metadata, a summary of the content (or the entire content) and a link to the original article, and they can be collected and accessed through web- or desktop-based RSS reader software. For convenience , the searches began with the latest posts and then worked back in time until 50 appropriate headlines had been found .

However , RSS feeds from newspapers vary in terms of what they include. Some include items such as crossword puzzles and comics, and there are frequently re-posts of the same articles, so the number of examined headlines (8,821) must be seen as an approximation. Attempts have been made to avoid counting re-posts as well as skipping some RSS posts that were obviously not articles.

I do not consider the somewhat inexact number of actual headlines scanned a problem for this study, since it was only meant to give a rough indication of how common this construction is. All papers except the Australian ones yielded 50 hits within a reasonable continuum that stretched from February 20 to March 31, 2012. The largest number of headlines examined is the Telegraph (972) and the smallest (excluding the Australian newspapers) is Washington Post (556) The interesting numbers for this study is of course the distribution of the 513 hits.

Another aspect worth commenting on is the question of what is considered “regular” editorial content as opposed to official blogs on the website of an online newspaper. My opinion, after studying the newspapers chosen, is that this is increasingly becoming an irrelevant distinction, and is not of importance for this study. Nothing indicated that the official blogs were stylistically different from “regular” editorial content when it came to headline language. No efforts have thus been made to eliminate such material. It would also have been unacceptably time-consuming.

3.2 The countries

The choice of countries is based on the initial research described in section 2.2, which indicated a US-UK divide when it comes to this particular feature; it appeared to be commonly used and even preferred in the US and generally avoided in the UK. Since the US and the UK are arguably the most culturally and linguistically influential countries in the English-speaking world, 4 newspapers from each of those two countries were chosen.

Two other large English-speaking countries, Canada and Australia, were chosen to see if any clear trends emerged which could be linked to either a US or a UK style. Canada is especially interesting, since it has a historical connection with the UK, as a long-time part of the British Empire, but is very likely highly culturally influenced by the US through its geographic proximity and shared history. When it comes to features of its written language, it has historically been influenced by both the US and the UK (Baugh & Cable 2002: 331)

3.3 The newspapers

For a study of this size and scope, it seemed wise to stay within a clearly defined newspaper genre. Thus , the newspapers chosen are exclusively general interest daily newspapers. Based on the assumption that the most well-known newspapers with the largest circulation probably exert a measure of stylistic influence over other newspapers, newspapers with either a large circulation, a national reach, or both have been selected.

Although, as mentioned earlier, there was reason to believe that there would be a US-UK divide, it could not be automatically assumed that all newspapers in a country would follow along the same stylistic lines when it came to headlines. Therefore , the choice was made to include four different newspapers from the US and the UK, which constitute the core of this study, and two from Canada and Australia respectively.

The main concern was to get some measure of diversity across the stylistic spectrum, while staying in the realm of general interest daily newspapers. The one stylistic continuum that came to mind in this genre was that between “serious” and “tabloid” newspapers. Here, these terms are not taken to indicate the actual print format, but the content.

Scientific definitions or discussions of which newspaper is “serious” or “tabloid” are again beyond the scope of this thesis, and it may even vary over time. Neither is such a definition essential for this study. I have thus followed public consensus, looking at how these media are typically referred to by readers, experts, commentators or other media online, in order to identify newspapers which could at least be considered to be at different ends of this continuum, in an attempt to get some stylistic diversity.

US Los Angeles Times, New York Post (“tabloid”), Washington Post, USA Today (national)
UK Daily Mail (national and “middle market”, considered slightly more serious than “tabloid”), Daily Mirror (“tabloid”, national), The Guardian (national), The Telegraph (national)
Australia The Australian (national), The Age
Canada Toronto Star, Calgary Sun (“tabloid”)

Table 1 shows the newspapers chosen for the corpus. Links to websites are found in the references section.

3.4 Australia

Of the countries included, Australia needs to be addressed specially. It quickly became apparent that the frequency of suitable headlines was much lower in Australian newspapers, simply because they favored very short headlines, and very seldom had coordinated elements. This held true not only for the two papers eventually included in the corpus, but of at least three other major Australian newspapers examined (two briefly and one in greater detail). It would seem that this is the normal headline language for Australian newspapers, and no major newspaper that differed substantially could be found.

After scanning 500 headlines for both of the selected newspapers, only 9 and 4 hits respectively had been found. This means that, statistically, it would have been necessary to go through somewhere between 2,500 and 5,000 headlines for these two newspapers alone to come up with 50 hits, which was not feasible. Neverthelessit seemed worthwhile to include Australia as a reference, so the line was drawn at 500 headlines.

It proved difficult to find RSS feeds for Australian newspapers, which led to a limited choice of candidates. Since I had chosen to focus on major and/or national newspapers, I ended up without an obvious “tabloid” among the Australian newspapers.

3.5 The corpus

The corpus (Appendix 1) consists of 513 headlines. They are presented per newspaper, with a separate table per phrase type. The types and their abbreviations, followed by an example are:

– Noun phrases (NP):

(19) Police could face annual fitness tests and compulsory redundancies [TGNP15]

– Verb phrases (VP):

(20) Glen Folkard: Australian shark attack surfer shows scar and tells of miraculous survival [DMAVP14]

– Adjective phrases (AP):

(21) Seuss’ ‘Lorax’ fun, fast-paced [CSAP1]

– Multiple phrases (MP) (This is my own term for a few cases where , for instance, an adjective and a noun are coordinated as modifiers for a noun head):

(22) Voters face ‘confusion’ over mayoral and police chief elections [TGMP1]

These terms  are used in a broad sense, and simply signify what word class the heads of the coordinated phrases belong to. The phrase tables are in turn divided up into subsections per table for the three different coordination forms, namely:

  • comma
  • and
  • ampersand (&)

Each table has been given a code, which combines the newspaper and the phrase types. These are the abbreviations used:

  • Los Angeles Times (LT)
  • New York Post (NYP)
  • Washington Post (WP)
  • USA Today (UST)
  • Daily Mail (DMA)
  • Daily Mirror (DMI)
  • The Guardian (TG)
  • The Telegraph (TT)
  • The Australian (AS)
  • The Age (AG)
  • Toronto Star (TS)
  • Calgary Sun (CS)

The tables thus have names like (TSVP) meaning “Toronto Star, verb phrase” and the headlines are numbered per table, generating codes like [LTNP5], meaning “Los Angeles Times, noun phrase, headline number 5”, which are placed after the headline examples. Headline examples are numbered along with other examples (running throughout) and are also easily identified by being set in the Courier New typeface, with the coordinated elements, including the coordinator, in bold type:

(23) Man saws off his own foot to avoid work and continue claiming jobless benefits [DMAVP7]

(24) Obama, Cameron warm up for talks with a little basketball [USTNP34]

In the case of verb phrases, I use boldface for the entire construction, including objects. In noncorpus examples, such as in section 2.1, I use the the Times New Roman typeface, and I also use boldface freely, for emphasis.

3.6 Not included in the corpus

A few headlines in the and-category where I consider a comma impossible, or at least extremely unlikely, have been excluded from the corpus. Here are the categories and the reasons for their exclusion. (There are no complete references to these examples, since they are not part of the corpus.)

1. Headlines where punctuation would make it absurd to replace and with a comma:

(25) Katy Perry’s hugging people in London…and getting ready for war (DMI)

2. Set phrases or expressions (that it seems very unlikely even the most notoriously space-saving editor would break up):

(26) The pros and cons of ditching cash for electronic currency (WP)

(This is a difficult category since it is not always possible to identify if something is a set phrase. How about “mother and child” or “joys and struggles”? These kinds of examples were very few, however, and had no overall statistical impact.)

3. Verbless single-phrase or coordinated-phrase headlines where substituting an and with a comma would make the whole headline nonsensical or change its meaning completely:

(27) “Hunger Games” and the great Hollywood hype machine (WP)

4. Direct quotes in headlines (since it seems highly unlikely that these would be edited to replace an and with a comma, and the coordination is thus not the result of any stylistic choice on the part of the newspaper):

(28) FW de Klerk says Nelson Mandela was ‘brutal and unfair‘ (TT)