All Articles

Sentiment Analysis with TensorFlow 2 and Keras using Python

TL;DR Learn how to preprocess text data using the Universal Sentence Encoder model. Build a model for sentiment analysis of hotel reviews.

This tutorial will show you how to develop a Deep Neural Network for text classification (sentiment analysis). We’ll skip most of the preprocessing using a pre-trained model that converts text into numeric vectors.

You’ll learn how to:

  • Convert text to embedding vectors using the Universal Sentence Encoder model
  • Build a hotel review Sentiment Analysis model
  • Use the model to predict sentiment on unseen data

Run the complete notebook in your browser

The complete project on GitHub

Universal Sentence Encoder

Unfortunately, Neural Networks don’t understand text data. To deal with the issue, you must figure out a way to convert text into numbers. There are a variety of ways to solve the problem, but most well-performing models use Embeddings.

In the past, you had to do a lot of preprocessing - tokenization, stemming, remove punctuation, remove stop words, and more. Nowadays, pre-trained models offer built-in preprocessing. You might still go the manual route, but you can get a quick and dirty prototype with high accuracy by using libraries.

The Universal Sentence Encoder (USE) encodes sentences into embedding vectors. The model is freely available at TF Hub. It has great accuracy and supports multiple languages. Let’s have a look at how we can load the model:

import tensorflow_hub as hub

use = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")

Next, let’s define two sentences that have a similar meaning:

sent_1 = ["the location is great"]
sent_2 = ["amazing location"]

Using the model is really simple:

emb_1 = use(sent_1)
emb_2 = use(sent_2)

What is the result?

print(emb_1.shape)
TensorShape([1, 512])

Each sentence you pass to the model is encoded as a vector with 512 elements. You can think of USE as a tool to compress any textual data into a vector of fixed size while preserving the similarity between sentences.

How can we calculate the similarity between two embeddings? We can use the inner product (the values are normalized):

print(np.inner(emb_1, emb_2).flatten()[0])
0.79254687

Values closer to 1 indicate more similarity. So, those two are quite similar, indeed!

We’ll use the model for the pre-processing step. Note that you can use it for almost every NLP task out there, as long as the language you’re using is supported.

Hotel Reviews Data

The dataset is hosted on Kaggle and is provided by Jiashen Liu. It contains European hotel reviews that were scraped from Booking.com.

This dataset contains 515,000 customer reviews and scoring of 1493 luxury hotels across Europe. Meanwhile, the geographical location of hotels are also provided for further analysis.

Let’s load the data:

df = pd.read_csv("Hotel_Reviews.csv", parse_dates=['Review_Date'])

While the dataset is quite rich, we’re interested in the review text and review score. Let’s get those:

df["review"] = df["Negative_Review"] + df["Positive_Review"]
df["review_type"] = df["Reviewer_Score"].apply(
  lambda x: "bad" if x < 7 else "good"
)

df = df[["review", "review_type"]]

Any review with a score of 6 or below is marked as “bad”.

Exploration

How many of each review type we have?

We have a severe imbalance in favor of good reviews. We’ll have to do something about that. However, let’s have a look at the most common words contained within the positive reviews:

“Location, location, location” - pretty common saying in the tourism business. Staff friendliness seems like the second most common quality that is important for positive reviewers.

How about the bad reviews?

Much more diverse set of phrases. Note that “good location” is still present. Room qualities are important, too!

Preprocessing

We’ll deal with the review type imbalance by equating the number of good ones to that of the bad ones:

good_df = good_reviews.sample(n=len(bad_reviews), random_state=RANDOM_SEED)
bad_df = bad_reviews
review_df = good_df.append(bad_df).reset_index(drop=True)
print(review_df.shape)
(173702, 2)

Let’s have a look at the new review type distribution:

We have over 80k examples for each type. Next, let’s one-hot encode the review types:

from sklearn.preprocessing import OneHotEncoder

type_one_hot = OneHotEncoder(sparse=False).fit_transform(
  review_df.review_type.to_numpy().reshape(-1, 1)
)

We’ll split the data for training and test datasets:

train_reviews, test_reviews, y_train, y_test =\
  train_test_split(
    review_df.review,
    type_one_hot,
    test_size=.1,
    random_state=RANDOM_SEED
  )

Finally, we can convert the reviews to embedding vectors:

X_train = []
for r in tqdm(train_reviews):
  emb = use(r)
  review_emb = tf.reshape(emb, [-1]).numpy()
  X_train.append(review_emb)

X_train = np.array(X_train)
X_test = []
for r in tqdm(test_reviews):
  emb = use(r)
  review_emb = tf.reshape(emb, [-1]).numpy()
  X_test.append(review_emb)

X_test = np.array(X_test)
print(X_train.shape, y_train.shape)
(156331, 512) (156331, 2)

We have ~156k training examples and somewhat equal distribution of review types. How good can we predict review sentiment with that data?

Sentiment Analysis

Sentiment Analysis is a binary classification problem. Let’s use Keras to build a model:

model = keras.Sequential()

model.add(
  keras.layers.Dense(
    units=256,
    input_shape=(X_train.shape[1], ),
    activation='relu'
  )
)
model.add(
  keras.layers.Dropout(rate=0.5)
)

model.add(
  keras.layers.Dense(
    units=128,
    activation='relu'
  )
)
model.add(
  keras.layers.Dropout(rate=0.5)
)

model.add(keras.layers.Dense(2, activation='softmax'))
model.compile(
    loss='categorical_crossentropy',
    optimizer=keras.optimizers.Adam(0.001),
    metrics=['accuracy']
)

The model is composed of 2 fully-connected hidden layers. Dropout is used for regularization.

We’ll train for 10 epochs and use 10% of the data for validation:

history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=16,
    validation_split=0.1,
    verbose=1,
    shuffle=True
)

Our model is starting to overfit at about epoch 8, so we’ll not train for much longer. We got about 82% accuracy on the validation set. Let’s evaluate on the test set:

model.evaluate(X_test, y_test)
[0.39665538506298975, 0.82044786]

82% accuracy on the test set, too!

Predicting Sentiment

Let’s make some predictions:

print(test_reviews.iloc[0])
print("Bad" if y_test[0][0] == 1 else "Good")

Asked for late checkout and didnt get an answer then got a yes but had to pay 25 euros by noon they called to say sorry you have to leave in 1h knowing that i had a sick dog and an appointment next to the hotel Location staff

Bad

The prediction:

y_pred = model.predict(X_test[:1])
print(y_pred)
"Bad" if np.argmax(y_pred) == 0 else "Good"
[[0.9274073  0.07259267]]
'Bad'

This one is correct, let’s have a look at another one:

print(test_reviews.iloc[1])
print("Bad" if y_test[1][0] == 1 else "Good")

Don t really like modern hotels Had no character Bed was too hard Good location rooftop pool new hotel nice balcony nice breakfast

Good

y_pred = model.predict(X_test[1:2])
print(y_pred)
"Bad" if np.argmax(y_pred) == 0 else "Good"
[[0.39992586 0.6000741 ]]
'Good'

Conclusion

Well done! You can now build a Sentiment Analysis model with Keras. You can reuse the model and do any text classification task, too!

You learned how to:

  • Convert text to embedding vectors using the Universal Sentence Encoder model
  • Build a hotel review Sentiment Analysis model
  • Use the model to predict sentiment on unseen data

Run the complete notebook in your browser

The complete project on GitHub

Can you use the Universal Sentence Encoder model for other tasks? Comment down below.

References