[1]厍向阳,杨瑞丽,董立红.基于Sword2vect的中文在线商品评价情感分析[J].西安科技大学学报,2020,(03):504-511.[doi:10.13800/j.cnki.xakjdxxb.2020.0318]
 SHE Xiang-yang,YANG Rui-li,DONG Li-hong.Sentiment analysis of Chinese online products evaluation based on Sword2vect[J].Journal of Xi'an University of Science and Technology,2020,(03):504-511.[doi:10.13800/j.cnki.xakjdxxb.2020.0318]
点击复制

基于Sword2vect的中文在线商品评价情感分析(/HTML)
分享到:

西安科技大学学报[ISSN:1672-9315/CN:61-1434/N]

卷:
期数:
2020年03期
页码:
504-511
栏目:
出版日期:
2020-05-15

文章信息/Info

Title:
Sentiment analysis of Chinese online products evaluation based on Sword2vect
文章编号:
1672-9315(2020)03-0504-08
作者:
厍向阳杨瑞丽董立红
(西安科技大学 计算机科学与技术学院,陕西 西安 710054)
Author(s):
SHE Xiang-yangYANG Rui-liDONG Li-hong
(College of Computer Science and Engineering,Xi'an University of Science and Technology,Xi'an 710054,China)
关键词:
情感分析 word2vect 支持向量机 情感词向量
Keywords:
emotion analysis word2vect SVM emotional word vector
分类号:
TP 391
DOI:
10.13800/j.cnki.xakjdxxb.2020.0318
文献标志码:
A
摘要:
商品的在线评论情感分析已经成为一个热门的研究话题。为了更好地解决情感分析中词语的上下文信息和词语的情感信息缺失问题,提出了一种基于句子情感得分加权句向量的Sword2vect情感分析方法,对中文在线评价进行情感分析。首先用基于词典的方法计算出评论句子的情感得分并对其进行预处理保证所有正向评论句子的情感得分为正,负向评论情感得分为负,用word2vect算法得到含有上下文信息评论的句子向量,然后用情感得分对句子向量进行加权得到情感句向量Sword2vect,用支持向量机算法对训练数据集进行训练得到模型,最后调用训练好的模型对测试数据集进行情感分析。采用基于情感得分加权的Sword2vect算法和word2vect词向量算法以及tf_idf特征词向量算法分别对京东手机在线评价以及谭松波酒店评价这2个数据集进行情感分析,从精确度、时间等方面进行比较。实验结果表明:基于情感得分加权的Sword2vect算法精确度较word2vect词向量算法精确度提升了10%~20%,相比于tf_idf特征词向量精度提升了20%~30%,Sword2vect算法的时间效率较其他2个算法也得到了较大的提升。
Abstract:
Emotional analysis of online reviews has become a hot topic.In order to better solve the problem of the context information of words in the sentiment analysis and the lack of emotional information of words,a Sword2vect sentiment analysis method,based on sentence sentiment score weighted sentence vector,is proposed to analyze the sentiment of Chinese online evaluation.Firstly,the dictionary-based method is used to calculate the sentiment scores of the comment sentences and pre-process them to ensure that the sentiment scores of all positive comment sentences are positive,and the negative review emotion scores are negative.The sentence vector with context information comments is obtained by word2vect algorithm.Then,the sentence vector is weighted by the emotional score,and the training data set is trained by the support vector machine algorithm to obtain the model.Finally,the trained model is adopted to perform sentiment analysis on the test data set.The Sword2vect algorithm based on sentiment score weighting and word2vect word vector algorithm and tf_idf feature vector algorithm are used to analyze the sentiment analysis of Jingdong mobile phone online evaluation and Tan Songbo hotel evaluation data respectively with a comparison made in terms of accuracy and time.The experimental results show that the accuracy of Sword2vect algorithm based on sentiment score weighting is 10%~20% higher than that of word2vect word vector algorithm,over the accuracy of Tf_idf feature vector being improved by 20%~30%.And the time efficiency of Sword2vect algorithm is also greatly improved compared with the other two algorithms.

参考文献/References:

[1] Naragund G H,Ki S K,Majumdar J.Development of decision making and analysis on customer reviews using sentiment dictionary for human-robot interaction[J].International Journal of Advanced Research in Computer and Communication Engineering,2015,4(8):387-390. [2]XU Jun,DING Yu-xin,WANG Xiao-long.Emotional automatic classification of news using machine learning methods[J].Chinese Journal of Information,2007,21(6):95-97. [3]Wang H,Yin P,Zhang L,et al.Sentiment classification of reviews; using sentence based Language model[J].Journal of Experimental & Theoretical Artificial Intelligence,2014,26(1):13-31. [4]李阳辉,谢 明,易 阳.基于深度学习的深交网络平台细粒度情感分析[J].计算机应用研究,2017,34(3):743-747. LI Yang-hui,XIE Ming,YI Yang.Fine-grained sentiment analysis of deep network platform based on deep learning[J].Application Research of Computers,2017,34(3):743-747. [5]Fu G,Wang X.Chinese sentence-lenel sentiment calssificatio based on fuzzy sets[C]//Proceedings of ACL2010:312-319. [6]Bollegala D,Weir D,Carroll J.Cross-domain sentiment classification using a sentiment sensitive thesaurus[J].IEEE Transaction on Knowledge and Data Engineering,2013,25(8):1719-1731. [7]Govindarajan M.Sentiment analysis of movie review using hybrid method of Na ve Bayes and genetic algorithm[J].International Journal of Advanced Computer Research,2013,3(4):139. [8]石强强,赵应丁,杨红云.基于SVM的酒店客户评价情感分析[J].计算机与现代化,2017(3):118-121. SHI Qiang-qing,ZHAO Ying-ding,YANG Hong-yun.SVM-based hotel customer evaluation sentiment analysis[J].Computer and Modernization,2017(3):118-121. [9]张冬雯,杨鹏飞,许云峰.基于word2vec和SVMperf的中文评论情感分析研究[J].计算机科学,2016,43(6A):418-420. ZHANG Dong-wen,YANG Peng-fei,XU Yun-feng.Chinese commentary sentiment analysis based on word2vect and SVMperf[J].Computer Science,2016,43(6A):418-420. [10]李 锐,张 谦,刘嘉勇.基于加权的微博情感分析[J].通信技术,2017,50(3):502-505. LI Rui,ZHANG Qian,LIU Jia-yong.Weighted Weibo sentiment analysis[J].Communications Technology,2017,50(3):502-505. [11]Kim Y.Convolutional neural networks for sentence classification[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:1746-1751. [12]Lai S,Xu L,Liu K.Recurrent convolutional neural networks for text classification[C]//Twenty-Ninth AAAI Conference on Artificial Intelligence,2015:2267-2272. [13]Cheng Lu.Research on the mechanism-based Bidirectional LSTM model for the sentiment classification of Chinese product reviews[J].Software Engineering,2017,20(11):5-6. [14]WANG Wei,SUN Yu-xia,QI Qing-jie,et al.Text sentiment classification model based on BiGRU-Attention neural network[J].Application Research of Computers,2018,36(12):3559-3563. [15]刘书齐,王以松,陈攀峰.基于CNN-ATTBiLSTM的文本情感分析[J].贵州大学学报,2019,33(2):86-89. LIU Shu-qi,WANG Yi-song,CHENG Pan-feng.Text sentiment analysis based on CNN-ATTBiLSTM[J].Journal of Guizhou University(Natural Science Edition),2019,33(2):86-89. [16]周 源,刘怀兰,杜朋朋,等.基于改进TF-IDF特征提取的文本分类模型研究[J].情报科学,2017,35(5):112-115. ZHOU Yuan,LIU Huai-lan,DU Peng-peng,et al.Research on text classification model based on improved TF-IDF feature extraction[J].Information Science,2017,35(5):112-115. [17]Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representation in vector space[J/OL].[2017-08-21].https:// arxiv.org/pdf/ 1301.3781.pdf. [18]Wu L,Hoi S C H,Yu N.Semantics-preserving bag-of-words models and applications[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2010,19(7):1908-1920. [19]Joachims T,Yu C.Sparse kernel SVMs Via cutting-plane training[M].Berlin Heidelbergy:Machine Learning and Knowledge Discovery in Date Bases Springer,2009.

备注/Memo

备注/Memo:
收稿日期:2020-01-03 责任编辑:高 佳
基金项目:陕西省自然科学基础研究(2019JLM-11); 陕西省自然科学基金(2017JM6105)
通信作者:厍向阳(1968-),男,陕西周至人,博士,教授,E-mail:1535594191@qq.com
更新日期/Last Update: 2020-05-15