An Optimal Feature Subset Selection by using Genetic Algorithm for an effective Text Classification

Main Article Content

P.Ramyaa, B.Karthik

Abstract

Social media and online forums are the communication mediums through which people can share their opinions, thoughts, ideas, views, etc. It will be helpful for a common man to understand things from different perspectives for making a crucial decision. It generates data in different varieties like text, image, audio, video, etc. The text data possess valuable information but it is hard to extract it as the data are in an unstructured format. It is the majorly contributed source in social media. The innovation of text mining is used to explore the hidden pattern and classify the data into their categories. Our proposed system uses IMDB movie review as a dataset which consists of positive class and negative class having 1000 text reviews in each class. Our proposed system employs word2vec for representing features in a text corpus and uses machine learning algorithms viz. K Nearest Neighbor, Logistic regression, and linear support vector machine to classify the text reviews. Adjective and adverb words are the two significant features that qualify nouns and verbs in the texts. These features are dependent on sentiment classification. These informative features could be extracted by integrating wordnet with a lexical database. Redundant and Irrelevant features are considered noise that could be ignored by using effective feature optimization techniques such as genetic algorithms. The proposed work provides remarkable performance in terms of accuracy 75%, precision 75%, recall 75%, and f1-score 75%.

Article Details

Section
Articles