Frequency

Word cloud produced using Wordle (SydTV-Std, untagged, w/out common English words)

On this page I have made available for download different kinds of frequency lists so that other researchers can compare their own data with SydTV and SydTV-Std (a partially standardized version of SydTV). Alternatively, you can use this online interface to undertake frequency analyses of the corpora. To help with calculations of normalized frequencies, Table 1 shows the corpus size depending on different token definitions.

Table 1 Corpus size depending on token def.

Frequency lists produced with Wordsmith (Version 7):

Hyphens do not separate words; ‘ not allowed within word

  • Wordsmith, Excel and text files: SydTV
  • Wordsmith, Excel and text files: SydTV-Std

Hyphens separate words; ‘ not allowed within word

  • Wordsmith, Excel and text files: SydTV
  • Wordsmith, Excel and text files: SydTV-Std

Hyphens do not separate words; ‘ allowed within word

  • Wordsmith, Excel and text files: SydTV
  • Wordsmith, Excel and text files: SydTV-Std

Hyphens separate words; ‘ allowed within word

  • Wordsmith, Excel and text files: SydTV
  • Wordsmith, Excel and text files: SydTV-Std

Frequency lists produced using AntConc (Version 3.44) – text files

Token definition settings: letter

Token definition settings: letter; append following definition ‘

Frequency lists produced using Sketch Engine – CSV files

Additional frequency lists, based on a more limited dataset, can be accessed on my website, which also includes discussion of selected items. There is much overlap between these lists, for example in relation to the trigram out of here, sometimes occurring in the utterance Let’s get … out of here, which is often cited as the most cliched line or stock phrase in cinema (see montage below).