JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE) ›› 2011, Vol. 41 ›› Issue (6): 18-23.

• Articles • Previous Articles     Next Articles

Study of bilingual words of part-of-speech(POS) disambiguation in the English-Chinese parallel corpus

FENG Min-xuan1, QU Wei-guang2,3*   

  1. 1. School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210046, China;
    2. School of Computer Science and Technology, Nanjing Normal University, Nanjing 210046, China;
    3. The Research Center of Information Security and Confidentiality Technology of Jiangsu Province, Nanjing 210097, China
  • Received:2011-04-15 Online:2011-12-16 Published:2011-04-15

Abstract:

 A part-of-speech disambiguation approach was given based on idiosyncratic rules in a parallel corpus unaligned at the lexical level. This approach focused on those words that occurred in the corpus at  very high frequency, while the part-of-speeches were difficult to determine. A number of idiosyncratic disambiguation rules were  constructed and an algorithm built on these rules was  applied on five typical words, among which were three Chinese words, “guoqu”, “jihua” and  “yu” and two English words, “back” and “so”. Experiments on a large scale parallel corpus obtained an F-score of 98.45% for the disambiguation of these words, and the results showed that the constructed rules would not be constrained by the length of context and the number of templates.

Key words: parallel corpus, part of speech disambiguation, words of POS ambiguity, automatic recognition, Chinese information processing

CLC Number: 

  • TP391.1
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!