热购娱乐网址gm777.top是一家集热购娱乐网址,热购娱乐网址,热购娱乐网址于一体的综合性娱乐公司,为玩家提供全方位的游戏体验,诚邀您的体验。

[1]許峰,張雪芬,忻展紅.基于深度神經網絡模型的中文分詞方案[J].哈爾濱工程大學學報,2019,40(09):1662-1666.[doi:10.11990/jheu.201812073]
 XU Feng,ZHANG Xuefen,XIN Zhanhong.A Chinese word segmentation scheme based on a deep neural network model[J].hebgcdxxb,2019,40(09):1662-1666.[doi:10.11990/jheu.201812073]
點擊復制

基于深度神經網絡模型的中文分詞方案(/HTML)
分享到:

《哈爾濱工程大學學報》[ISSN:1006-6977/CN:61-1281/TN]

卷:
40
期數:
2019年09期
頁碼:
1662-1666
欄目:
出版日期:
2019-09-05

文章信息/Info

Title:
A Chinese word segmentation scheme based on a deep neural network model
作者:
許峰1 張雪芬2 忻展紅1
1. 北京郵電大學 經濟管理學院, 北京 100876;
2. 北京聯合大學 智慧城市學院, 北京 100101
Author(s):
XU Feng1 ZHANG Xuefen2 XIN Zhanhong1
1. School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. Smart City College, Beijing Union University, Beijing 100101, China
關鍵詞:
中文分詞長短期記憶網絡編碼-解碼模型詞向量準確率F
分類號:
TN911.22
DOI:
10.11990/jheu.201812073
文獻標志碼:
A
摘要:
針對目前已有的分詞算法和程序在處理海量網絡文本分詞時性能下降的問題,本文提出了一種基于深度神經網絡模型的中文分詞方案。該方案利用基于長短期記憶網絡的編碼-解碼模型對數據模型進行訓練,并采用得到的模型進行分詞。為了提升分詞性能,進一步提出了一種基于詞向量的修正方法,對采用上述模型的分詞結果進行修正。對典型微博語料數據集的實驗結果表明,提出基于模型的分詞性能相對于傳統的分詞軟件的分詞性能有了較大提升。采用提出的詞向量修正方法修正后的分詞準確率和F值略優于未修正的分詞準確率和F值,從而驗證了論文提出的分詞方案的有效性。

參考文獻/References:

[1] 羅剛, 張子憲. 自然語言處理原理與技術實現[M]. 北京:電子工業出版社, 2016.
[2] 黃昌寧, 趙海. 中文分詞十年回顧[J]. 中文信息學報, 2007, 21(3):8-19.HUANG Changning, ZHAO Hai. Chinese Word Segmentation:a decade review[J]. Journal of Chinese information processing, 2007, 21(3):8-19.
[3] 黃昌寧. 中文信息處理中的分詞問題[J]. 語言文字應用, 1997(1):72-78.
[4] WU Andi, JIANG Zixin. Word segmentation in sentence analysis[C]//Proceedings of the 1998 International Conference on Chinese Information Processing. Beijing, 1998:169-180.
[5] UTIYAMA M, ISAHARA H. A statistical model for domain-independent text segmentation[C]//Proceedings of the 39th Annual Meeting on Association for Computational Linguistics. Toulouse, France, 2001:499-506.
[6] LOW J K, NG H T, GUO Wenyuan. A maximum entropy approach to Chinese word segmentation[C]//Proceedings of the 4th Sighan Workshop on Chinese Language Processing. Jeju Island, Korea, 2005:161-164.
[7] ZHAO Hai, HUANG Changning, LI Mu. An improved Chinese word segmentation system with conditional random field[C]//Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. Sydney, 2006:162-165.
[8] XUE Nianwen. Chinese word segmentation as character tagging[J]. Computational linguistics and Chinese language processing, 2003, 8(1):29-48.
[9] TSENG H, CHANG Pichuan, ANDREW G, et al. A conditional random field word segmenter for Sighan bakeoff 2005[C]//Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing,Association for Computational Linguistics. 2005:168-171.
[10] CHANG Pichuan, GALLEY M, MANNING C D. Optimizing Chinese word segmentation for machine translation performance[C]//Proceedings of the 3rd Workshop on Statistical Machine Translation. Columbus, Ohio, 2008:224-232.
[11] 劉穎. 網絡語言的變異分析:現象、成因及發展趨勢[D]. 福州:福建師范大學, 2012.LIU Ying. Linguistic variation of netspeak:phenomenon, reasons and future developments[D]. Fuzhou:Fujian Normal University, 2012.
[12] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[13] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780.
[14] CHO K, VAN MERRIENBOER B, GüL?EHRE ?, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar, 2014:1724-1734.
[15] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[C]//Proceedings of 2015 International Conference on Learning Representations. 2015:1-15.
[16] LAI Siwei, LIU Kang, HE Shi, et al. How to generate a good word embedding?[J]. IEEE intelligent systems, 2016, 31(6):5-14.
[17] 沈翔翔, 李小勇. 使用無監督學習改進中文分詞[J]. 小型微型計算機系統, 2017, 38(4):744-748.SHEN Xiangxiang, LI Xiaoyong. Improving Chinese word segmentation via unsupervised learning[J]. Journal of Chinese computer systems, 2017, 38(4):744-748.
[18] QIU Xipeng, QIAN Peng, YIN Liusong, et al. Overview of the NLPCC 2015 shared task:Chinese word segmentation and POS tagging for micro-blog texts[C]//Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing. Nanchang, China, 2015:541-549.
[19] MIKOLOV T, CHEN Kai, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of Workshop at International Conference on Learning Representations. 2013:1-12.

備注/Memo

備注/Memo:
收稿日期:2018-12-22。
基金項目:國家自然科學基金項目(61672178).
作者簡介:許峰,男,博士研究生;張雪芬,女,副教授;忻展紅,男,教授,博士生導師.
通訊作者:張雪芬,E-mail:zhangxuefen@buu.edu.cn.
更新日期/Last Update: 2019-09-06
热购娱乐网址