Finding Adjectives and Adverbs
November 7th, 2010It has taken me several attempts to find an effective method for isolating adverbs, partly due to the many different grammatical contexts in which they can occur, and partly because (unlike nouns and verbs) they are uninflected in English. Inflection makes a difference because it determines how many word forms are needed to create a false positive. For uninflected categories only one form is needed, therefore false positives occur more readily than when a complete paradigm must be seen.
To overcome this problem I decided to look for adverbs and adjectives in pairs as a means to provide the cross-checking that was needed. Derivation of adverbs from adjectives is highly productive in English, and morphologically very regular, so there are few adverbs that are excluded in principle by this methodology.
(Some authors have argued that the relationship between adjectives and adverbs is so productive that it should be modelled as inflection rather than derivation. The traditional objection to this idea is that inflection should not change the lexical category of a word, however that begs the question because it assumes that adverbs and adjectives are separate categories. My difficulty lies with the semantics rather than the syntax: the usual meaning of an adverb derived from an adjective is ‘in an X manner’, but there are too many exceptions to comfortably get away without explicit entries in the lexicon.)
The morphological relationship is sufficient to act as an initial filter for the adverbs. For adjectives I looked for words preceded by a hedge such as ’slightly’ or ‘very’. The script can be found here and the results were as follows:
| threshold | matches | modifers | mistakes | accuracy | efficiency |
|---|---|---|---|---|---|
| 1 | 935 | 913 | 22 | 97.6% | — |
| 2 | 594 | 584 | 10 | 98.3% | 3.6% |
| 3 | 463 | 457 | 6 | 98.7% | 3.1% |
| 4 | 386 | 383 | 3 | 99.2% | 4.1% |
| 6 | 302 | 300 | 2 | 99.3% | 1.2% |
Most of the false positives (17 out of 22) were noun-adjective pairs that had slipped through the hedge filter. A significant fraction of these were due to the use of ‘very’ as an adjective, notably within the phrase ‘the very time’. Others formed part of expressions such as ‘time consuming’ and ‘time critical’.
The total number of pairs found is quite low, even without setting a threshold. This is largely attributable to the hedge filter, for two reasons. Firstly, by its nature it is specific to comparable adjectives and adverbs (or to be more precise, those that have been used comparably in the corpus — hence the appearance of unique in the output). Secondly, even for comparable adjectives, only about one instance in a hundred is accepted. Both of these characteristics are undesirable, but I don’t currently have a better solution.
In the interests of consistency I will be trying to stick to the policy of using a threshold of two. The resulting word list can be found here.