Simple Parser for Natural Language?

Tue Jul 27 14:42:41 UTC 1999

At 12:13 AM 7/27/99 +0400, Luciano Notarfrancesco wrote:
>> Bob wrote:
>>
>> At 01:09 AM 7/17/99 +0400, Luciano Notarfrancesco wrote:
>> >I was planning to implement NGrammars for doing syntactic analysis in order to do better prosody generation for a text-to-speech system. I believe the N-grams stuff could be useful for your project.
>> 
>> I've seen statistical techniques used for NL analysis, but never N-grams, because they're not really suited for understanding/analysis.  The typical application in a speech recognition program is to filter the recognizer's hypotheses.  They can be very useful in that application.  Tri-gram filtering usually leaves in a lot of amusingly non-English schmutz.  But four-gram filtering produces utterances which are all valid English utterances.
>
>Bob, actually N-Grams are used for ambiguity resolution in NL processing. These techniques have been used for part-of-speech tagging, for estimating lexical probabilities

I was thinking of the parsing/interpretation phase proper, rather than the various support stages (e.g. tokenizing, tagging, stemming).  You're right, of course, N-grams can be quite useful for those kind of tasks.

                      and for building probabistically based parsing algorithms.

I'll have to see a reference on using N-grams in parsing proper, even excluding interpretation, to believe this.  An N-gram parser is a contradiction in terms: parsers build recursive structures that can be used to create compositional semantic representations; N-grams deal with linear adjacency.  There are some essential, non-trivial structures in parsing/interpretation (e.g. Wh questions, such as 'Who did the papers report that the police were accusing ___ of the crime?'), where the dependency that needs to be captured is potentially unbounded, and way beyond the scope of any N-gramm system to handle because (1) the distances are greater than a trigram (the amount of existing annotated training data, which is large, supports trigrams at best and four-grams if you use a clever back off strategy; going beyond this would require a sufficiently larger corpus than is economically feasible to annotate); and (2) there is no way to factor the dependency into smaller chunks, that could be consumed by iterative trigrams, since there are no intermediate indications of the 'passing' of the WH phrase.

>            I read the Viterbi algorithm using bigram or trigram probability models can attain accuracy rates of over 95 percent for part-of-speech tagging.

Always take success/failure rates with a large grain of salt if you're interested in doing something with the results of the system being evaluated.  Speech people are obsessed with statistics and love to reduce any domain to 'one number' on some scale so they can tune on that.  But sometimes the quality of the performance of the system is more important than the raw coverage figures.  When I last seriously looked at the errors that a tagger was producing in detail, I noticed that the error rate was misleadingly high.  The tagger very rarely mistook major categories for one another in the head position of a constituent; e.g. it rarely mistook a head noun for a head verb or vice versa.  Where the errors lay was in the pre-modifiers to nouns; e.g. is 'steel' in 'steel box' an adjective or a noun?  But these were precisely the sorts of alternations I would expect a robust system to be relatively immune to.  It doesn't matter whether 'steel' is labelled as an adjective or noun, what does matter is that its relation to 'box' is determined correctly (i.e. that it most likely means the material the box is made of, with a secondary interpretation of being a box for (storing) steel).

Also, where more than one performance figure is available, speech people tend to choose the most favorable one.  So, for example, the error rate for speech recognition systems is stated in terms of word error rate, which was down below 10% the last time I looked.  However, sentence error rate was much higher, around 40%.  So, while only 10% of the words may be wrong (and this figure is itself probably misleading, since it is a factored score consisting of some product of insertions (i.e. the recognizer added in a word actually not in the speech input), deletions (i.e. the recognizer dropped a word), and substitutions), almost every other sentence had an error of some sort in it.

Finally, statistically based systems are touted as being simpler to build than the dread 'knowledge-based' systems (hence, I tend to regard statistical systems as 'ignorance-based'), but this claim fails to take into account the cost of constructing the annotated training (not to mention test and development test) corpora needed.  For speech training, the cost of the annotated corpus is trivial: all you need is somebody who knows English and is conscientious.  For part-of-speech tagging, the corpus is more expensive: you need annotators who either know the part-of-speech system you're using or who can learn one; and there are more difficult and, ultimately, unresolvable except by fiat, decisions to make; cf. the 'steel box' example.  Syntactic annotation (either with phrase structure or dependency analyses) is yet more expensive: the notation is more complicated and the decisions more difficult.  And semantic annotation, which is the holy grail of the speech world's approach to NL understanding, more difficullt still.

I've never seen anybody touting a 'learned' system who has ever calculated the expense of the annotation process, or even acknowledged that there was a (non-trivial) expense.  The selling point is that experts are unnecessary; relatively unskilled annotators can do all the work.  This leaves out the fact that (1) somebody has to teach the annotators the annotation language and monitor them to make sure they are annotating correctly; and (2) unlike orthographic transcription for speech, very often the annotation language itself does not exist and must be developed.  It's not clear to me that the cost of building the knowledge-based system is any greater than the cost of developing the needed annotation language(s), hiring and training the annotators, and doing the annotation.  And, of course, if you move to a new domain, you have to do the annotation all over again.  And, if it turns out that your annotation language was more closely tied to your original domain than you thought, you may have to develop that all over again, too.

>Do you have experience on speech recognition?

Nope, I'm an NL hacker and syntactic chauvinist.  I worked in a speech department for about 10 years, though, so I've absorbed a lot by osmosis.

                                               It would be great to do something on that area for Squeak, don't you think so?

It would be fun, but I don't expect the great panacea from speech recognition that most people outside the field do.  The major sticking point is the out-of-vocabulary issue: every word that you want the system to recognize must be in the recognizer's phonetic dictionary; if it isn't, it will be mis-recognized as something else.  And by every word, I mean every token, not every type; i.e. all the actual inflected forms must be present.  So, for example, if you want the system to recognize 'songs', but only 'song' is in the phonetic dictionary, 'songs' will not be recognized.  Therefore, if you wanted to talk to Squeak about Squeak, every class name and every selector would need to be in the phonetic dictionary.  If you added new class names and new selectors, they would need to be added, too.

This is why speech recognition systems are most successful in domain-specific applications, where the vocabulary is relatively closed.  For example, BBN had this bitchin' voice router application: dial one central number, wait for the beep, say the person's name, and get connected.  It was so good that, even though I had the names and numbers of everybody in my department taped to the wall, so I had only to lift my eyes to see a number, I never consulted the list, just used the voice router.  I dialed in from Canada at a time when performance was only being advertised for calls within the building, and it worked perfectly.

For this, I give the speech people much credit.  While NL hackers have always been promising the moon, speech researchers have carefully looked for applications that, while they may not have been sexy, were compatible with the state of the art, or only just a little beyond it.  So, if you want 'speech for Squeak', look for an area that would be useful and that is within the bounds of existing technology and go for it, but don't expect HAL.
-30-
Bob Ingria
As always, at a slight angle to the universe