plag-check Paper

Published: 2021-09-12 14:05:08
essay essay

Category: Computer Science

Type of paper: Essay

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Hey! We can write a custom essay for you.

All possible types of assignments. Written by academics

GET MY ESSAY
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
1 Architecture and Design
1.1 System Overview
The overview of the system is proposed in Figure 4. It represents the modules involved in building the system. The modules include:

Figure 1: System Overview
Web application – representing the User Interface
Intermediate
ask server
Sentiment classi
er i.e. MultinomialNB
Dataset, split into training and testing Dept. of CSE, DSCE, Bangalore 78 Page 1
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
1.2 Software Architecture
1.2.1 System Block Diagram
The system block diagram represents the entire system in the form of three ma jor
modules or blocks- the User Interface, the intermediate server and the classi
er.
Figure 2: System Block Diagram
The User can select the product from the User Interface. This sends a request
to the intermediate server which then selects the test data accordingly and
then sends it to the classi
er.
The classi
er then uses this data to generate the result and sends the results
to the intermediate server.
Using this result, the ChartJS API can generate graphs accordingly and render
it on the user home screen in real time. Dept. of CSE, DSCE, Bangalore 78 Page 2
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
1.2.2 Flowchart Diagram
Flowchart is used to present the order of steps and rationale of solving a problem,
before composing the full computer program. It additionally helps in communicating
the steps of the solution to others. A
owchart is a graphical representation of the
problem solving.
Figure 3: FlowchartDept. of CSE, DSCE, Bangalore 78 Page 3
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
1.2.3 Data Flow Diagram
A Data Flow Diagram is mainly used to understand the exchange of data and the
form of data transfer between di erent modules. The Data Flow Diagram is repre-
sented in Figure 7.
Figure 4: Data Flow DiagramAs can be understood from the given diagram, the
ow of data is as follows:
The user select the product from the front-end, the selection in the form of
product-id is sent to the intermediate server.
The server then sends the corresponding test data to the already trained Multi-
nomialNB classi
er. The classi
er then generated the results which is sent to
the Javascript which renders it in the web application using ChartJS.
2 Implementation
2.1 Implementation Platform
2.1.1 Hardware Processor: Intel Core i5
RAM: 8 GB
GPU: None Dept. of CSE, DSCE, Bangalore 78 Page 4
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
The above mentioned hardware con
guration is used to train our machine learning
model.
2.1.2 Software Operating System: Ubuntu/Windows (16) (64bit)
Programming Languages: Python, JavaScript, HTML5, CSS
Machine Learning Models: Naive-Bayes Classi
er and Logistic Regression
Server: Flask Server
2.2 Implementation Details
2.2.1 Dataset Collection
The training dataset has been taken from Kaggle website. The url of the selected
dataset is:
mobile-phones. Dept. of CSE, DSCE, Bangalore 78 Page 5
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
It contains product reviews for di erent mobile phone products with at least ten
thousand reviews for each mobile phone. The dataset comprises of id, mobile name,
mobile brand, price, reviews and ratings.
The dataset has di erent type of reviews for each product which includes negative
reviews, positive reviews and some neutral reviews. Rows with no NaN value are
used for training the model.
Further for training and testing data, the dataset has been divided into 80:20 ratio. 2.2.2 Data frame development for model
The key elements for sentiment analysis are reviews and rating. So a data frame is
developed that contains only the reviews and rating with corresponding product. All the rows with null values are removed from this data frame as they may decrease
the e ciency and accuracy of the model. Dept. of CSE, DSCE, Bangalore 78 Page 6
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
2.2.3 Polarising the reviews
The dataset contains three types of reviews- positive reviews, negative reviews and
neutral reviews. Polarization is required for classifying such data. First of all the
neutral reviews have been dropped. Then the positive reviews has been assigned with a one and negative reviews with a
zero. 2.2.4 Splitting the data for training and testing purpose
train test split() is a function provided by model selection module of the sklearn
library. This function is used to split the data into training data and testing data
in 80:20 ratio respectively. Dept. of CSE, DSCE, Bangalore 78 Page 7
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
2.2.5 Using NLP to pre-process the data
For natural language processing, nltk library(natural language toolkit) is mostly
incorporated. It contains a large variety of functions for pre-processing.Natural
Language Processing comprise of few steps:
Removing the white spaces and other unwanted symbols.
Splitting the reviews into words
Removal of stopwords
Stopwords are the words like am, are, is, etc., which are not required for
analysis. This is done by a function called stopwords provided by the corpus
module of nltk library.
Stemming of the obtained words
Stemming is the process of reducing the words to their base form, like playing,
plays, played to play. This is done by the Snowball Stemmer of stem module
provided by the nltk library.
Joining back the words to get the cleaned reviews
Joining back the words to form statements that consist of meaningful words
and no stopwords. Dept. of CSE, DSCE, Bangalore 78 Page 8
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
2.2.6 Applying Models
Two machine learning models have been applied:
Mutinomial Naive-Bayes Classi
er
Naive- Bayes Classi
er algorithm belongs to a family of probabilistic algorithm
based on Bayes assumption of conditional independence between every pair of
a feature. MultinimialNB is an instance of a Naive Bayes classi
er which uses
a multinomial distribution for each of the features. MultinomialNB is the best
choice for NLP problems as it is fast, reliable and accurate in a number of
applications of NLP. Sklearn library provides MiltinomialNB function under
na ve bayes module.
Dept. of CSE, DSCE, Bangalore 78 Page 9
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19

Logistic Regression
Logistic regression is a classi
er that incorporates predictive analysis. Logistic
Regression is one of the best suited classi
er for sentiment analysis as it is used
when the dependent variable(target) is categorical. Sklearn library provides
LogisticRegression function under linear model module.
2.2.7 Checking the accuracy and confusion matrix
For all the values of a test dataset, a machine learning model only predicts the out-
comes as either positive or negative. The evaluation parameters for any classi
cation
are based on only four outcomes, namely, true positive, true negative, false positive
and false negative. Dept. of CSE, DSCE, Bangalore 78 Page 10
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
The terms positive and negative refer to the classi
er’s prediction, and the terms
true and false refer to whether that prediction matches to the external knowledge.
True positive (TP): correct positive prediction
False positive (FP): incorrect positive prediction
True negative (TN): correct negative prediction
False negative (FN): incorrect negative prediction
Confusion Matrix: Confusion Matrix calculates the no. of correct & incorrect
predictions which is further summarized with the no. of count values and break-
down into each classes. It can be used to get precision, accuracy, recall,etc. Confu-
sion matrix is provided by the metrics module of the sklearn.
Accuracy Score:
Acuracy is the prediction of the model, that the model got right.
Accuracy score() is provided by the metrics module of sklearn or we can calculate
it using the confusion matrix. Dept. of CSE, DSCE, Bangalore 78 Page 11
Sentiment Analysis on Large Scale Amazon Product Reviews 2018-19
Accuracy Score and Confusion Matrix for the MultinomialNB model is shown below:
Accuracy Score and Confusion Matrix for the LogisticRegression model is shown
below: Dept. of CSE, DSCE, Bangalore 78 Page 12

Warning! This essay is not original. Get 100% unique essay within 45 seconds!

GET UNIQUE ESSAY

We can write your paper just for 11.99$

i want to copy...

This essay has been submitted by a student and contain not unique content

People also read