Journal of Shandong University(Engineering Science) ›› 2024, Vol. 54 ›› Issue (4): 67-75.doi: 10.6040/j.issn.1672-3961.0.2023.112

• Machine Learning & Data Mining • Previous Articles     Next Articles

Chinese-Uyghur cross-lingual named entity recognition by fusing data augmentation and knowledge migration

GE Yifei1, Azragul1,2*, CHEN Degang1   

  1. 1. College of Computer Science and Technology, Xinjiang Normal University, Urumqi 830054, Xinjiang, China;
    2. National Language Resource Monitoring &
    Research Center of Minority Languages, Beijing 100081, China
  • Published:2024-08-20

CLC Number: 

  • TP391
[1] DONG C H, ZHANG J J, ZONG C Q, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C] //Natural Language Understanding and Intelligent Applications. Kunming, China: Springer, 2016: 239-250.
[2] MCCULLOCH W S, PITTS W. A logical calculus of the ideas immanent in nervous activity[J]. The Bulletin of Mathematical Biophysics, 1943, 5(4): 115-133.
[3] PIRES T, SCHLINGER E, GARRETTE D. How multilingual is multilingual BERT?[C] //Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: ACL, 2020: 4996-5001.
[4] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: ACL, 2019: 4171-4186.
[5] CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised cross-lingual representation learning at scale[C] //Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Seattle, USA: ACL, 2020: 8440-8451.
[6] CONNEAU A, LAMPLE G. Cross-lingual language model pretraining[C] //Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, USA: NIPS, 2019: 634.
[7] DAI X, ADEL H. An analysis of simple data augmentation for named entity recognition[C] //Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain: COLING, 2020: 3861-3867.
[8] DING B, LIU L, BING L, et al. DAGA:data augmentation with a generation approach for low-resource tagging tasks[C] //Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing(EMNLP). Seattle, USA: ACL, 2020: 6045-6057.
[9] 李丽双, 郭元凯. 基于CNN-BLSTM-CRF模型的生物医学命名实体识别[J]. 中文信息学报, 2018, 32(1): 116-122. LI Lishuang, GUO Yuankai. Biomedical named entity recognition with CNN-BLSTM-CRF[J]. Journal of Chinese Information Processing, 2018, 32(1): 116-122.
[10] LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C] //Proceedings of the Eighteenth International Conference on Machine Learning. San Francisco, USA: ACM, 2001: 282-289.
[11] 张海楠, 伍大勇, 刘悦, 等. 基于深度神经网络的中文命名实体识别[J]. 中文信息学报, 2017, 31(4): 28-35. ZHANG Hainan, WU Dayong, LIU Yue, et al. Chinese named entity recognition based on deep neural network[J]. Journal of Chinese Information Processing, 2017, 31(4): 28-35.
[12] 杨飘, 董文永. 基于BERT嵌入的中文命名实体识别方法[J]. 计算机工程, 2020, 46(4): 40-45. YANG Piao, DONG Wenyong. Chinese named entity recognition method based on BERT embedding[J]. Computer Engineering, 2020, 46(4): 40-45.
[13] 宋佳芮, 陈艳平, 王凯, 等. 基于Affix-Attention的命名实体识别语义补充方法[J].山东大学学报(工学版), 2023, 53(2): 70-76. SONG Jiarui, CHEN Yanping, WANG Kai, et al. Semantic supplement method for named entity recognition based on Affix-Attention[J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 70-76.
[14] GE Y, CHEN D, LI K, et al. Uyghurlanguage recognition method based on BIGRU-IDCNN-ATT-CRF[C] //Proceedings of the 2021 7th International Symposium on System and Software Reliability(ISSSR). Chongqing, China: IEEE, 2021: 146-151.
[15] GE Y, YUSUP A, CHEN D, et al. UGDA: data augmentation methods for Uyghur language named entity recognition[C] //Proceedings of the 2022 9th International Conference on Dependable Systems and Their Applications(DSA). Urumqi, China: IEEE, 2022: 926-932.
[16] ANWAR A, LI X, YANG Y, et al. Constructing Uyghurnamed entity recognition system using neural machine translation tag projection[C] //China National Conference on Chinese Computational Linguistics. Hainan, China: CCL, 2020: 247-260.
[17] 梁世宁. 零样本跨语言序列标注关键技术研究[D]. 长春: 吉林大学, 2022. LIANG Shining. Research on key techniques in zero-shot cross-lingual sequence labeling[D]. Changchun: Jilin University, 2022.
[18] 佘琪星. 面向低资源的跨语言命名实体识别方法[D]. 哈尔滨: 哈尔滨工业大学, 2021. SHE Qixing. Cross-lingual named entity recognition in a low resource setting[D]. Harbin: Harbin Institute of Technology, 2021.
[19] LIANG S, SHOU L, PEI J, et al. CalibreNet:calibration networks for multilingual sequence labeling[C] //Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Jerusalem, Israel: WSDM, 2021: 842-850.
[20] YAN H, QIAN T, XIE L, et al. Unsupervised cross-lingual model transfer for named entity recognition with contextualized word representations[J]. Plos One, 2021, 16(9): e0257230.
[21] LIU L, DING B, BING L, et al. MulDA:a multilingual data augmentation framework for low-resource cross-lingual NER[C] //Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Bangkok, Thailand: ACL-IJCNLP, 2021: 5834-5846.
[22] LIANG S, GONG M, PEI J, et al. Reinforcediterative knowledge distillation for cross-lingual named entity recognition[C] //Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. Singapore: KDD, 2021: 3231-3239.
[23] ZHOU R, LI X, HE R, et al. MELM:data augmentation with masked entity language modeling for low-resource NER[C] //Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin, Ireland: ACL, 2022: 2251-2262.
[24] JAIN A, PARANJAPE B, LIPTON Z C. Entity projection via machine translation for cross-lingual NER[C] //Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP). Hong Kong, China: ACL, 2019:1083-1092.
[25] ZHU S, CAO R, YU K. Dual learning for semi-supervised natural language understanding[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28: 1936-1947.
[26] SHOU L, BO S, CHENG F, et al. Mining implicit relevance feedback from user behavior for web question answering[C] //Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. San Diego, USA: KDD, 2020: 2931-2941.
[27] HOU Y, CHEN S, CHE W, et al. C2C-GenDA: cluster-to-cluster generation for data augmentation of slot filling[C] //Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver, Canada: AAAI, 2021: 13027-13035.
[28] REIMERS N, GUREVYCH I. Sentence-BERT:sentence embeddings using siamese BERT-networks[C] //Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP). Hong Kong, China: ACL, 2019: 3982-3992.
[29] YANG Z, XU Z, CUI Y, et al. CINO:a Chinese minority pre-trained language model[C] //Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju, Korea: COLING, 2022: 3937-3949.
[30] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] // Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: NIPS, 2017: 6000-6010.
[31] LI Y, JIANG J, JIA Y J, et al. TIP-LAS: an open-source toolkit for Tibetan word segmentation and POS tagging[J]. Journal of Chinese Information Processing, 2015, 29: 203-207.
[1] YANG Jucheng, WEI Feng, LIN Liang, JIA Qingxiang, LIU Jianzheng. A research survey of driver drowsiness driving detection [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 1-12.
[2] XIAO Wei, ZHENG Gengsheng, CHEN Yujia. Named entity recognition method combined with self-training model [J]. Journal of Shandong University(Engineering Science), 2024, 54(2): 96-102.
[3] Gang HU, Lemeng WANG, Zhiyu LU, Qin WANG, Xiang XU. Importance identification method based on multi-order neighborhood hierarchical association contribution of nodes [J]. Journal of Shandong University(Engineering Science), 2024, 54(1): 1-10.
[4] Jiachun LI,Bowen LI,Jianbo CHANG. An efficient and lightweight RGB frame-level face anti-spoofing model [J]. Journal of Shandong University(Engineering Science), 2023, 53(6): 1-7.
[5] Yujiang FAN,Huanhuan HUANG,Jiaxiong DING,Kai LIAO,Binshan YU. Resilience evaluation system of the old community based on cloud model [J]. Journal of Shandong University(Engineering Science), 2023, 53(5): 1-9, 19.
[6] Ying LI,Jiankun WANG. The classification of mild cognitive impairment based on supervised graph regularization and information fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(4): 65-73.
[7] YU Yixuan, YANG Geng, GENG Hua. Multimodal hierarchical keyframe extraction method for continuous combined motion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 42-50.
[8] ZHANG Hao, LI Ziling, LIU Tong, ZHANG Dawei, TAO Jianhua. A technology prediction model based on fuzzy Bayesian networks with sociological factors [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 23-33.
[9] WU Yanli, LIU Shuwei, HE Dongxiao, WANG Xiaobao, JIN Di. Poisson-gamma topic model of describing multiple underlying relationships [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 51-60.
[10] YU Mingjun, DIAO Hongjun, LING Xinghong. Online multi-object tracking method based on trajectory mask [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 61-69.
[11] HUANG Huajuan, CHENG Qian, WEI Xiuxi, YU Chuchu. Adaptive crow search algorithm with Jaya algorithm and Gaussian mutation [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 11-22.
[12] LIU Fangxu, WANG Jian, WEI Benzheng. Auxiliary diagnosis algorithm for pediatric pneumonia based on multi-spatial attention [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 135-142.
[13] LIU Xing, YANG Lu, HAO Fanchang. Finger vein image retrieval based on multi-feature fusion [J]. Journal of Shandong University(Engineering Science), 2023, 53(2): 118-126.
[14] Yue YUAN,Yanli WANG,Kan LIU. Named entity recognition model based on dilated convolutional block architecture [J]. Journal of Shandong University(Engineering Science), 2022, 52(6): 105-114.
[15] Xiaobin XU,Qi WANG,Bin GAO,Zhiyu SUN,Zhongjun LIANG,Shangguang WANG. Pre-allocation of resources based on trajectory prediction in heterogeneous networks [J]. Journal of Shandong University(Engineering Science), 2022, 52(4): 12-19.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHANG Bo,LI Shu-cai,YANG Xue-ying,WANG Xi-ping,ZHANG Dun-fu . Numerical analysis on the stability of a rocksalt roadbed with two circular cavities [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(1): 66 -69 .
[2] . [J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(1): 96 -101 .
[3] LIU Guo-cai, LIU Yu-chang, JU Pei-jun. A new delay-dependent global asymptotic stability forneural networks with time-varying delays[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(4): 53 -56 .
[4] LEI Xiao-feng1, ZHUANG Wei1, CHENG Yu1, DING Shi-fei1, XIE Kun-qing2. OPHCLUS:An order-preserving based hierarchical clustering algorithm[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(5): 48 -55 .
[5] XU Shu-gen,WANG Wei-qiang,LI Meng-li,SONG Ming-da, . Relationship analysis of the explosion load type and the failure form of urea reactor[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2008, 38(3): 51 -57 .
[6] YANG Pengpeng, WANG Kui, LI Lei, ZHAO Lanming. Bi-level programming research on unit commitment[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2011, 41(3): 167 -172 .
[7] ZHANG Jianming, LIU Quansheng, TANG Zhicheng, ZHAN Ting, JIANG Yalong. New peak shear strength criterion with inclusion of shear action history[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 0, (): 77 -81 .
[8] FENG Zhen-heng, ZHANG Zhong-cheng. Simulation and optimization on ethyl acetate producing batch reactive distillation[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2010, 40(3): 154 -158 .
[9] WANG Jun-guang, LIANG Bing. Investigation on triaxial creep claystone under pore pressure[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2009, 39(3): 135 -138 .
[10] JI Tao,GAO Xu/sup>,SUN Tong-jing,XUE Yong-duan/sup>,XU Bing-yin/sup> . Characteristic analysis of fault generated traveling waves in 10 Kv automatic blocking and continuous power transmission lines[J]. JOURNAL OF SHANDONG UNIVERSITY (ENGINEERING SCIENCE), 2006, 36(2): 111 -116 .