pexels-photo-355952

Using Cloud-Based AI Keyword Analysis for Text Optimization

Recently I’ve seen a lot of discussion and debate about the use of Applicant Tracking Systems and how they parse resumes for keywords to try to determine if a candidate is a good fit for a job posting. Regardless of one’s thoughts on the systems, for better or for worse, the reality is they are widely used today. Using public cloud-based artificial intelligence I was curious to see if you could use these online services to try to better your odds at matching for keyphrases an employer or recruiter might use for jobs you’d be interested in applying for. To put each companies respective engines to the test and have a better idea for the strengths and weaknesses of this technology I also ran a simple tweet, a newspaper editorial, and the bulk of a poem through each. I was somewhat surprised at the outcomes.

I’ll go into specifics in the results section of each service but all in all I most pleased with the results from IBM Watson. Surprisingly Azure came in second best for me with Google and AWS falling short in my opinion.

You may also notice some similarity with a previous post, Detecting Sentiment and Emotion in Text, but that’s unavoidable as the keyword or keyphrase functionality is tied in with all the related natural language processing services. However, these functions serve very different purposes. What the text is about rather than how the text may make one feel. Although, some of the services will return sentiment along with their keywords as I’ll note where appropriate.

Below are the 4 samples of text I used to test each of the platforms’ artificial intelligence offerings:

Sample #1 is from my LinkedIn bio:

I am a well-rounded, cloud focused, IT professional. I’m comfortable in any role from contributor to manager. I’m willing to grow within an organization to achieve the goal of returning to a manager/director role in a cloud-focused company.

My roles and interests encompass a wide range of responsibilities that provide me with expertise in many areas including cloud solutions offered by Amazon Web Services and Microsoft Azure, security, domains and DNS, web technologies (servers and development), and legal compliance. My experience also includes responsibilities for developing code to interact with cloud platform APIs, Iaas, DaaS, HPE Helion, CloudSystem 9 and 10, Scality, Stratoscale Symphony, Google Analytics, Google Adwords, etc.

Throughout my 10 years of leadership positions some of the numerous initiatives I have been responsible for include:

• Collecting and correlating data to design an anti-fraud solution that decreased fraud by 86%
• Reorganized and led a team of 45 agents increasing service levels from 45% to 90% in less than 6 weeks
• The design & implementation of a global load balancing solution that improved page load times by up to 500% depending on global location

Program Management and technical project management has allowed me to work with world-class customers delivering complex multi-million-dollar solutions on a timely basis. In doing so, I work with numerous vendors and inside team members from executive management to professional services, engineers, and developers.

Sample #2 is the first nearly 5000 characters of Edgar Allen Poe’s “The Raven“.

Sample #3 is the text of an editorial on cryptocurrencies by The Guardian.

Sample #4 is this tweet about a cute dog:

Meet Yogi. He made his first trip to the beach today. Could never have guessed eating sand would be so exhausting. 12/10 would bring back soon pic.twitter.com/8dnaCObcZ4

— WeRateDogs™ (@dog_rates) February 19, 2018

Let’s pull back the curtain and see what each service reports for our examples.

AWS Comprehend

Resources:
Comprehend Product Page
Comprehend FAQs
Comprehend Documentation
Comprehend Pricing

Features

AWS offers the following features as part of its Comprehend natural language processing service. In addition to the overall sentiment detected, the Sentiment Analysis function will give you scores for each possible value to show you how certain it is of its decision, out to 14 decimal places. I’ll be rounding that up a bit for the sake of your eyes. For this post, I only experimented with the Keyphrase Extraction function.

Keyphrase Extraction: The Keyphrase Extraction API returns the key phrases or talking points and a confidence score to support that this is a key phrase.

Sentiment Analysis: The Sentiment Analysis API returns the overall sentiment of a text (Positive, Negative, Neutral, or Mixed).

Entity Recognition: The Entity Recognition API returns the named entities (“People,” “Places,” “Locations,” etc.) that are automatically categorized based on the provided text.

Language Detection: The Language Detection API automatically identifies text written in over 100 languages and returns the dominant language with a confidence score to support that a language is dominant.

Topic Modeling: Topic Modeling identifies relevant terms or topics from a collection of documents stored in Amazon S3. It will identify the most common topics in the collection and organize them in groups and then map which documents belong to which topic.

Pricing

Natural Language Processing requests are measured in units of 100 characters, with a 3 unit (300 characters) minimum charge per request. The free tier gives you access to 50,000 units of text (about 5 million characters) for each of the APIs per month with each additional unit starting at $0.0001 with significant cost reductions based on volume. Topic Modeling is the exception as it is priced per job.

I like Amazon’s price structure as it gives you quite a lot of wiggle room to play and remains inexpensive to continue using. Complete pricing can be found via the appropriate link in the Resources box above.

Limits

For my tests, only the first limit really applied. It was sufficient for my needs and seems large enough to accommodate most tasks you’d need it for. I included all limits, as of writing this post, for reference:

  • The maximum document size is 5,000 bytes of UTF-8 encoded characters.
  • The maximum number of documents for the BatchDetectDominantLanguage, BatchDetectEntities, BatchDetectKeyPhrases, and BatchDetectSentiment operations is 25 documents per request.
  • The BatchDetectDominantLanguage and DetectDominantLanguage operations have the following limitations:
    • They don’t support phonetic language detection. For example, they will not detect “arigato” as Japanese, nor “nihao” as Chinese.
    • They may have trouble distinguishing close language pairs, such as Indonesian and Malay; or Bosnian, Croatian, and Serbian.
    • For best results, the input text should be at least 20 characters long.

API

With Amazon, you really have to install and use either the CLI or language-specific SDK. I used the PHP SDK and found it really easy to install and quick to code against – not to mention clean. For the authentication in the SDK to be successful please be sure your system’s clock/time is set correctly.

<?php
// Setup and install your SDK first! For PHP you do that with composer
require 'vendor/autoload.php';

// Provide you AWS API keys, the user for these keys need to be given permissions to Comprehend
$aws_access_key = '';
$aws_secret_access_key = '';

// The text you want to analyze
$text = '';

// Do the work and get your result
$aws = new Aws\Sdk([
    'version'       => 'latest',
    'region'        => 'us-east-1',
    'credentials'   => [
        'key'           => $aws_access_key,
        'secret'        => $aws_secret_access_key,
    ],
    'Comprehend'    => [
        'region'        => 'us-east-1'
    ],
]);;
$comprehend = $aws->createComprehend();
$aws_result = $comprehend->detectKeyPhrases([
    'LanguageCode'  => 'en',
    'Text'          => $text
]);

// $aws_result will be an array you can access for the returned data
echo 'AWS Results' . PHP_EOL;
echo 'Keyphrases: ' . PHP_EOL;
foreach ( $aws_result['KeyPhrases'] as $i ) {
    echo $i['Text'] . ' (' . $i['Score'] . ')' . PHP_EOL;
}

// Alternately, if you'd like to see the full set of data returned
print_r($aws_result);
?>

Results

I found AWS seems to just run through each sentence and pick out each keyword it runs across. This results in a lot of keyword duplication. That’s fine if you are looking at relevance within a sentence or a small block of text but I personally would be looking at the document as a whole rather than chunked. I also note that Comprehend tends to be very loose on keyphrase detection (“Amazon Web Services and Micrsoft Azure”), tends to include some punctuation in keywords (see sample #2), and some words or phrases that I would have picked out either were not returned as important or had low confidence scores.

As mentioned above I’ve rounded the 14 decimal places up for readability. You get back a collection of key phrases that Amazon Comprehend identified from the input text. For each key phrase, the response provides the text of the keyword or phrase, where the keyphrase begins and ends, and the level of confidence that Amazon Comprehend has in the accuracy of the detection. I believe the score is from 0 to 1.0 with higher values indicating a higher confidence but I didn’t see this specifically mentioned in the documentation; I’m basing it on how AWS documents sentiment scores on other functions in the Comprehend service. Below I only list the first 20 keyphrases for each sample along with the confidence score for each.

Sample #1 – LinkedIn BioSample #2 – The RavenSample #3 – EditorialSample #4 – Dog Tweet
a well-rounded, cloud focused (0.895357)a midnight dreary (0.856855)Last month (0.990694)Yogi (0.779558)
any role (0.991994)many a quaint and curious volume (0.884559)a plague (0.998621)his first trip (0.999373)
contributor (0.986457)forgotten lore (0.920504)kittens (0.999314)the beach (0.932810)
manager (0.988315)nearly napping (0.721991)the most fashionable cryptocurrencies (0.998065)today (0.994661)
an organization (0.9967615)a tapping (0.996199)the internet (0.995693)sand (0.915146)
the goal (0.999682)some one (0.849618)news (0.983833)12/10 (0.981981)
a manager/director role (0.9908485)my chamber door (0.881901)the cryptocurrency (0.72025179862976) 
a cloud-focused company (0.997656)“’Tis (0.746844)Ethereum (0.933024) 
My roles and interests (0.888028)some visitor (0.985374)bills (0.744276) 
a wide range (0.999404)my chamber door (0.898626)the world computer” (0.847476) 
responsibilities (0.999585)Only this (0.714963)a distributed program (0.993759) 
expertise (0.996745)the bleak December (0.984577)large parts (0.999525) 
many areas (0.998761)each separate dying ember (0.912081)both the legitimate banking system (0.948817) 
cloud solutions (0.998675)its ghost (0.998404)the legal system (0.998754) 
Amazon Web Services and Microsoft Azure (0.934452)the floor (0.996622)contracts (0.934160) 
security (0.934451)the morrow (0.996977)computer code (0.997170) 
domains (0.898301)my books surcease (0.957688)Ethereum (0.986565) 
DNS (0.667329)sorrowsorrow (0.996818)the plaything (0.997950) 
web technologies (0.992197)skip all the way down to spot 69…skip down to spot 115, the last items returned… 
servers and development (0.856500)a stately Raven (0.879681)the cryptokittens (0.977406) 

Azure Text Analytics

Resources:
Text Analytics Product Page
Text Analytics Documentation
Text Analytics API Docs
Text Analytics Pricing

Features

Azure’s feature set in Text Analytics is considerably lighter than the competing services. It does what I was looking for so I can’t complain about that but if you’re looking for more functionality you may need to look at IBM or AWS.

Sentiment analysis
The API returns a numeric score between 0 and 1. Scores close to 1 indicate positive sentiment, and scores close to 0 indicate negative sentiment. Sentiment score is generated using classification techniques. The input features of the classifier include n-grams, features generated from part-of-speech tags, and word embeddings. It is supported in a variety of languages.

Key phrase extraction
The API returns a list of strings denoting the key talking points in the input text. We employ techniques from Microsoft Office’s sophisticated Natural Language Processing toolkit. English, German, Spanish, and Japanese text are supported.

Language detection
The API returns the detected language and a numeric score between 0 and 1. Scores close to 1 indicate 100% certainty that the identified language is true. A total of 120 languages are supported.

Pricing

The free tier gives you up to 5,000 transactions. Once you go beyond that mark be prepared to pay $75 for the next tier. There is no per transaction incremental charge so you’ll see big jumps in fees for potentially small changes in your usage. Each “document” analyzed for each API call will be considered a transaction. Honestly, the subscription model they offer would likely keep me from using Azure’s Text Analytics for any small projects that I didn’t have an immediate method to monetize.

Limits

I do like that Azure will let you push a lot of text through at one time to help reduce the likelihood of being rate limited. Their “document” size is in-line with Amazon’s limits as well.

  • Maximum size of a single document 5,000 characters as measured by String.Length.
  • Maximum size of entire request 1 MB
  • Maximum number of documents in a request 1,000 documents
  • The rate limit is 100 calls per minute. Note that you can submit a large quantity of documents in a single call (up to 1000 documents).

API

With a little Googling, I found a PHP SDK for Azure but it seems to only support the base IaaS services right now. I was going for consistency with using a single programming language in this post so I had to use Azure’s REST API. Authentication is handled via passing your API key via a header in your HTTP POST.

<?php
// Provide your Azure API key to access the Text Analytics service
$azure_key_1 = '';
$azure_endpoint = 'https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases';

// The text you want to analyze
$text = '';

// Do the work and get your result
$data = array(
    'documents' => array(
        array(
            'language' => 'en',
            'id' => '1',
            'text' => "$text"
            )
        )
    );
$data = json_encode($data);

$azure = curl_init();
curl_setopt($azure, CURLOPT_URL, $azure_endpoint);
curl_setopt($azure, CURLOPT_POST, 1);
curl_setopt($azure, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($azure, CURLOPT_HTTPHEADER, array(
    "Ocp-Apim-Subscription-Key: $azure_key_1",
    'Content-Type: application/json',
    'Accept: application/json'
));
curl_setopt($azure, CURLOPT_POSTFIELDS, $data);
$response = curl_exec($azure);
$azure_result = json_decode($response, true);
curl_close($azure);

// $azure_result will be an array you can access for the returned data
echo 'Azure Results:' . PHP_EOL;
foreach ( $azure_result['documents'][0]['keyPhrases'] as $i ) {
    echo $i . PHP_EOL;
}

// Alternately, if you'd like to see the full set of data returned
print_r($azure_result);
?>

Results

For the sentiment tests, I wasn’t crazy about lack of detail from Azure. However, in this instance, I’m relatively happy with their keyword results. I received the list of keyphrases I was looking for and, in my opinion, their parsing is fairly tight and pretty accurate as far as what I would pick out. However, I felt some interesting keywords were missed or too broadly lumped in as a phrase (I’m thinking about ravens and cryptokittens here). They stripped out irrelevant punctuation from words and seemingly avoided duplication, which I liked. Don’t get me wrong, some results are still a little odd but at least Azure doesn’t break Amazon away from Web Services or lump it in as part of a larger phrase. I’m only listing the first 20 keywords returned for each sample.

Sample #1 – LinkedIn BioSample #2 – The RavenSample #3 – EditorialSample #4 – Dog Tweet
cloud solutionschamber doorcode of Ethereumtrip
cloud-focused companytappingsoftwarebeach
professional servicesominous bird of yoreweeksand
executive managementebony birdignorance of computer codeYogi
cloud platform APIslost Lenoreworld computer 
Program Managementmemories of Lenorewake of bitcoin 
fraud solutionstately Ravenfashionable cryptocurrencies 
managerwhispered wordthousands of cryptocurrencies 
designsad fancylegitimate banking system 
global load balancing solutionTislegal system 
Amazon Web Servicessoulmodern computer chips 
technical project managementancient Raven wanderingreal currency 
web technologiessculptured busttrust 
Google Analyticsplacid busttimes 
dollar solutionsbust of Pallaspropagation of imaginary kittens 
team membersangelslegitimate companies 
Google Adwordsentranceworld’s credit card system 
director rolevelvet-violet liningmedium of exchange 
numerous initiativesmelancholy burden boreexcellent medium 
numerous vendorscushion’s velvet liningplague of kittens 

Google Natural Language

Resources:
Natural Language Product Page
Natural Language Documentation
Natural Language REST API Docs
Natural Language Pricing

Features

Google offers some interesting tools. Syntax Analysis and Entity Recognition look pretty cool although I’m not sure what practical use I would personally have for them. I had to read through the docs a bit to catch they lump keyword detection in with Entity Recognition.

Syntax Analysis
Extract tokens and sentences, identify parts of speech (PoS) and create dependency parse trees for each sentence.

Entity Recognition
Identify entities and label by types such as person, organization, location, events, products, and media.

Sentiment Analysis
Understand the overall sentiment expressed in a block of text.

Content Classification
Classify documents in predefined 700+ categories.

Multi-Language
Enables you to easily analyze text in multiple languages including English, Spanish, Japanese, Chinese (Simplified and Traditional), French, German, Italian, Korean and Portuguese.

Pricing

The Natural Language API is priced using units of measurement known as text records. A text record may contain up to 1,000 Unicode characters within the text content sent to the API for evaluation. Text in excess of these 1,000 characters counts as additional records. Prices are expressed in dollars per 1,000 text records (1,000,000 Unicode characters).

  • Free up to 5K
  • Over 5K “text records” processed cost depends on features used.

Limits

Google’s limits are based on text size, words/tokens in the text, and entity mentions. The API responds in different ways based on which limit you’ve exceeded. It’s not easy to describe so I’d recommend you read up on Googles Natural Language Quotas.

API

Google Cloud Platform does offer a PHP SDK but frankly, there were too many packages required to be installed to support it on my VM and I didn’t want to hassle with it. I decided to simply use their REST API instead. Your requests are authenticated by including your API key in the POST URI.

<?php
// Provide your gcp API key to access the Text Analytics service
$gcp_api_key = '';
$gcp_endpoint = 'https://language.googleapis.com/v1/documents:analyzeEntitySentiment?key=';

// The text you want to analyze
$text = '';

// Do the work and get your result
$data = array(
    'document' => array(
        'type' => 'PLAIN_TEXT',
        'language' => 'en',
        'content' => "$text"
        ),
    'encodingType' => 'UTF8'
    );
$data = json_encode($data);
$gcp = curl_init();
curl_setopt($gcp, CURLOPT_URL, "$gcp_endpoint$gcp_api_key");
curl_setopt($gcp, CURLOPT_POST, 1);
curl_setopt($gcp, CURLOPT_RETURNTRANSFER, true);
curl_setopt($gcp, CURLOPT_HTTPHEADER, array(
    'Content-Type: application/json',
    'Accept: application/json'
));
curl_setopt($gcp, CURLOPT_POSTFIELDS, $data);
$response = curl_exec($gcp);
$gcp_result = json_decode($response, true);
curl_close($gcp);

// $gcp_result will be an array you can access for the returned data
echo 'Google Cloud Platform' . PHP_EOL;
echo 'Keyphrases: ' . PHP_EOL;
foreach ( $gcp_result['entities'] as $i ) {
    echo $i['name'] . PHP_EOL;
    echo 'Salience: ' . $i['salience'] . PHP_EOL;
    echo 'Sentiment: ' . PHP_EOL;
    echo ' - Score: ' . $i['sentiment']['score'] . PHP_EOL;
    echo ' - Magnitude: ' . $i['sentiment']['magnitude'] . PHP_EOL;
    echo PHP_EOL;
}

// Alternately, if you'd like to see the full set of data returned

print_r($gcp_result);
?>

Results

Google’s keyword or phrase matching seems very tight, almost too tight. Note “Amazon” is stipped off from “Web Services”. Google breaks your document down to sentence entities and finds keywords in each entity so you will see duplication of keywords and differing salience values because of that. The salience scores seemed a bit off to what I would have thought. Specifically looking at Sample #3, Ethereum had a score of 0.0057588066 and cryptokittens the value of 0.001952143. Both of which I would have expected to be higher.

The Salience value looks to be a value between 0 and 1.0. The higher the relevance of the keyword to the text the higher the salience score will be.

Google’s Score is a value between -1.0 (Negative) and 1.0 (positive) and corresponds to the overall emotional leaning of the text. Strong positive and negative statements could balance each other out in the score. Generally, negative is -1.0 to -0.25, neutral is -0.25 to 0.25, and positive is 0.25 to 1.0.

Magnitude is from 0.0 to +inf. The higher the magnitude the stronger the positive or negative sentiment, based on the score. Lower magnitude indicates statements balancing others out, while higher values indicate the weight of the score overall. Because the Magnitude can extend to infinity it seems like there is far too much room to wonder about its value compared to scores for other blocks of text. Very subjective.

Sample #1 – LinkedIn BioSample #2 – The RavenSample #3 – EditorialSample #4 – Dog Tweet
IT professional
Salience: 0.2407715
Sentiment:
0.1 [magnitude]
0 [score]
chamber door
Salience: 0.33098266
Sentiment:
0.3 [magnitude]
0 [score]
cryptocurrencies
Salience: 0.036509313
Sentiment:
0.9 [magnitude]
0.9 [score]
Meet Yogi
Salience: 0.8369134
Sentiment:
0 [magnitude]
0 [score]
cloud
Salience: 0.098823205
Sentiment:
0.2 [magnitude]
0.2 [score]
Lenore Nameless
Salience: 0.067261524
Sentiment:
2 [magnitude]
-0.2 [score]
one
Salience: 0.03649575
Sentiment:
0.5 [magnitude]
0.5 [score]
trip
Salience: 0.07094695
Sentiment:
0 [magnitude]
0 [score]
manager
Salience: 0.0476862
Sentiment:
0 [magnitude]
0 [score]
ember
Salience: 0.024451666
Sentiment:
1.2 [magnitude]
-0.6 [score]
program
Salience: 0.033484023
Sentiment:
0.1 [magnitude]
0 [score]
beach
Salience: 0.048720725
Sentiment:
0 [magnitude]
0 [score]
role
Salience: 0.041792165
Sentiment:
0 [magnitude]
0 [score]
tapping
Salience: 0.022468273
Sentiment:
0 [magnitude]
0 [score]
kittens
Salience: 0.030432394
Sentiment:
0.9 [magnitude]
-0.9 [score]
sand
Salience: 0.043418925
Sentiment:
0.8 [magnitude]
-0.8 [score]
responsibilities
Salience: 0.036698595
Sentiment:
0.4 [magnitude]
0.2 [score]
lore
Salience: 0.02031062
Sentiment:
0 [magnitude]
0 [score]
internet
Salience: 0.024274383
Sentiment:
0 [magnitude]
0 [score]
 
contributor
Salience: 0.033336643
Sentiment:
0 [magnitude]
0 [score]
volume
Salience: 0.02031062
Sentiment:
0.9 [magnitude]
-0.9 [score]
contracts
Salience: 0.018840473
Sentiment:
0 [magnitude]
0 [score]
 
organization
Salience: 0.03316731
Sentiment:
0.1 [magnitude]
0.1 [score]
heart
Salience: 0.016615199
Sentiment:
0.8 [magnitude]
-0.4 [score]
cryptocurrency
Salience: 0.017475905
Sentiment:
0 [magnitude]
0 [score]
 
solution
Salience: 0.026711168
Sentiment:
0.7 [magnitude]
-0.3 [score]
more
Salience: 0.015859045
Sentiment:
0.1 [magnitude]
-0.1 [score]
world computer
Salience: 0.017475905
Sentiment:
0 [magnitude]
0 [score]
 
load balancing solution
Salience: 0.019928368
Sentiment:
0.2 [magnitude]
0.1 [score]
nothing
Salience: 0.015256954
Sentiment:
0.2 [magnitude]
-0.2 [score]
cryptocurrencies
Salience: 0.01685153
Sentiment:
0.2 [magnitude]
-0.1 [score]
 
Program Management
Salience: 0.016665569
Sentiment:
0.3 [magnitude]
0.3 [score]
word
Salience: 0.015148128
Sentiment:
0.8 [magnitude]
-0.4 [score]
plague
Salience: 0.016228218
Sentiment:
0.7 [magnitude]
-0.7 [score]
 
manager/director role
Salience: 0.015977792
Sentiment:
0 [magnitude]
0 [score]
human being
Salience: 0.014992289
Sentiment:
0.5 [magnitude]
0 [score]
game
Salience: 0.014894219
Sentiment:
0 [magnitude]
0 [score]
 
company
Salience: 0.014537535
Sentiment:
0.9 [magnitude]
0.9 [score]
visitor
Salience: 0.012147822
Sentiment:
0.4 [magnitude]
-0.4 [score]
banking system
Salience: 0.014547589
Sentiment:
0 [magnitude]
0 [score]
 
expertise
Salience: 0.0144847995
Sentiment:
0.1 [magnitude]
0.1 [score]
master
Salience: 0.011745481
Sentiment:
1.2 [magnitude]
-0.2 [score]
system
Salience: 0.014547589
Sentiment:
0.1 [magnitude]
-0.1 [score]
 
roles
Salience: 0.013508974
Sentiment:
0.2 [magnitude]
0.2 [score]
Raven “Nevermore
Salience: 0.011127895
Sentiment:
2.4 [magnitude]
-0.2 [score]
computer code
Salience: 0.014547589
Sentiment:
0 [magnitude]
0 [score]
 
goal
Salience: 0.012724441
Sentiment:
0.9 [magnitude]
0.9 [score]
floor
Salience: 0.010994431
Sentiment:
0.1 [magnitude]
-0.1 [score]
parts
Salience: 0.011588174
Sentiment:
0.8 [magnitude]
0.8 [score]
 
interests
Salience: 0.012678256
Sentiment:
0.2 [magnitude]
0.2 [score]
morrow
Salience: 0.010954482
Sentiment:
0 [magnitude]
0 [score]
neither
Salience: 0.011522884
Sentiment:
0.4 [magnitude]
0.2 [score]
 
Web Services
Salience: 0.011578682
Sentiment:
0 [magnitude]
0 [score]
ghost
Salience: 0.009621214
Sentiment:
0.2 [magnitude]
-0.2 [score]
news
Salience: 0.0093484325
Sentiment:
0 [magnitude]
0 [score]
 
servers
Salience: 0.011578682
Sentiment:
0.1 [magnitude]
0.1 [score]
maiden
Salience: 0.009586242
Sentiment:
0.5 [magnitude]
0.5 [score]
dream
Salience: 0.00748501
Sentiment:
0.5 [magnitude]
0.2 [score]
 
experience
Salience: 0.010530363
Sentiment:
0 [magnitude]
0 [score]
books surcease
Salience: 0.009586242
Sentiment:
0 [magnitude]
0 [score]
programmers
Salience: 0.0069575156
Sentiment:
0 [magnitude]
0 [score]
 
customers
Salience: 0.010236813
Sentiment:
0 [magnitude]
0 [score]
sorrowsorrow
Salience: 0.009586242
Sentiment:
0 [magnitude]
0 [score]
need
Salience: 0.0068155993
Sentiment:
0.3 [magnitude]
-0.1 [score]
 

IBM Cloud (Watson) Natural Language Understanding

Resources:
NLU Product Page
NLU Documentation
NLU API Documentation

Features

IBM’s feature set is the most mature among the services I tested. This shouldn’t be a surprise considering their success with Watson over the years and the training they’ve been able to apply to their artificial intelligence platform. This article only covers the sentiment and emotion functions.

Categories
Categorize your content using a five-level classification hierarchy. View the complete list of categories here.

Concepts
Identify high-level concepts that aren’t necessarily directly referenced in the text

Emotion
Analyze emotion conveyed by specific target phrases or by the document as a whole. You can also enable emotion analysis for entities and keywords that are automatically detected by the service.

Entities
Find people, places, events, and other types of entities mentioned in your content. View the complete list of entity types and subtypes here.

Keywords
Search your content for relevant keywords.

Metadata
For HTML and URL input, get the author of the webpage, the page title, and the publication date.

Relations
Recognize when two entities are related, and identify the type of relation.

Semantic Roles
Parse sentences into subject-action-object form, and identify entities and keywords that are subjects or objects of an action.

Sentiment
Analyze the sentiment toward specific target phrases and the sentiment of the document as a whole. You can also get sentiment information for detected entities and keywords by enabling the sentiment option for those features. As you’ll read below this maturity and ease of use applies to their API as well.

Pricing

Pricing depends on your overall IBM Cloud subscription plan. On the Lite (read free) plan you get up to 30,000 NLU items per month. Whereas if you upgrade to the paid Standard plan you are purely usage based on every item request, with discounts based on volume. I found it interesting that when you sign up for the Lite plan they don’t even ask for a credit card number.

An NLU item is based on the number of data units enriched and the number of enrichment features applied. A data unit is 10,000 characters or less. For example: extracting Entities and Sentiment from 15,000 characters of text is (2 Data Units * 2 Enrichment Features) = 4 NLU Items.

Limits

This was incredibly difficult to track down on the IBM website. Based on 30 July 2017 Release Notes, text greater than 50K chars will be truncated, the previous limit was 1 kilobytes (1024 bytes). Fifty thousand characters is considerably higher than the other services discussed in this post.

API

The IBM Cloud website only details their REST API so that’s what I used to code against. However, an after the fact search revealed there is a PHP SDK (https://github.com/CognitiveBuild/WatsonPHPSDK or https://github.com/ThomasIBM/php-sdk).

A nice plus for IBM’s API is that you can request multiple features in each API call. They still charge you per feature requested but it’s a lot more convenient to be able to ask once for everything you want rather than making individual calls for each function.

<?php
// IBM will give you a specific username and password to access the NLU service
$ibm_user = '';
$ibm_pass = '';
$ibm_endpoint = 'https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze?version=2017-02-27';

// The text you want to analyze
$text = '';

// Do the work and get your result
$data = array(
    'text' => "$text",
    'features' => array(
        'keywords' => array(
            'sentiment' => true,
            'emotion' => true,
            )
        ),
    );
$data = json_encode($data);
$ibm = curl_init();
curl_setopt($ibm, CURLOPT_URL, $ibm_endpoint);
curl_setopt($ibm, CURLOPT_POST, 1);
curl_setopt($ibm, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ibm, CURLOPT_HTTPHEADER, array(
    'Content-Type: application/json',
    'Accept: application/json'
));
curl_setopt($ibm, CURLOPT_USERPWD, "$ibm_user:$ibm_pass");
curl_setopt($ibm, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ibm, CURLOPT_POSTFIELDS, $data);
$response = curl_exec($ibm);
$ibm_result = json_decode($response, true);
curl_close($ibm);

// $ibm_result will be an array you can access for the returned data
echo 'IBM Cloud / Watson' . PHP_EOL;
echo 'Keyphrases: ' . PHP_EOL;
foreach ( $ibm_result['keywords'] as $i ) {
    echo $i['text'] . PHP_EOL;
    echo 'Relevance: ' . $i['relevance'] . PHP_EOL;
    echo 'Sentiment: ' . $i['sentiment']['score'] . PHP_EOL;
    echo 'Emotions: ' . PHP_EOL;
    echo ' -Sadness: ' . $i['emotion']['sadness'] . PHP_EOL;
    echo ' -Joy: ' . $i['emotion']['joy'] . PHP_EOL;
    echo ' -Fear: ' . $i['emotion']['fear'] . PHP_EOL;
    echo ' -Disgust: ' . $i['emotion']['disgust'] . PHP_EOL;
    echo ' -Anger: ' . $i['emotion']['anger'] . PHP_EOL;
    echo PHP_EOL;
}

// Alternately, if you'd like to see the full set of data returned
print_r($ibm_result);
?>

Results

Based on the maturity of Watson, perhaps I shouldn’t be surprised that I found its results the most relevant and accurate.

Watson returned tight and accurate keywords and phrases and caught most of what I felt were important. I was very happy with the results. I think Watson was the only engine to specifically pick out “Raven” for Sample #2. The relevance scores were pretty good but I found the sentiment and emotion values to be the most interesting in this context of keywords rather than the entire block of text. Perhaps it works better with dramatic text found in short stories and poems but to see the word “raven” get a relatively high level of certainty for Sadness, Joy, Fear, and Anger (leaving out only Disgust) seemed very appropriate.

Sentiment score ranging from -1 (negative sentiment) to 1 (positive sentiment). Emotion scores ranging from 0 to 1 for sadness, joy, fear, disgust, and anger. A 0 means the text doesn’t convey the emotion, and a 1 means the text definitely carries the emotion.

Sample #1 – LinkedIn BioSample #2 – The RavenSample #3 – EditorialSample #4 – Dog Tweet
cloud platform APIs
Relevance: 0.995924
Sentiment: 0
Emotions:
-Sadness: 0.055568
-Joy: 0.211603
-Fear: 0.046425
-Disgust: 0.013121
-Anger: 0.02802
chamber door
Relevance: 0.901334
Sentiment: -0.0842639
Emotions:
-Sadness: 0.122429
-Joy: 0.629781
-Fear: 0.202819
-Disgust: 0.152286
-Anger: 0.245196
real currency
Relevance: 0.900134
Sentiment: -0.486542
Emotions:
-Sadness: 0.06918
-Joy: 0.263669
-Fear: 0.086167
-Disgust: 0.036775
-Anger: 0.123255
sand
Relevance: 0.988505
Sentiment: -0.672347
Emotions:
-Sadness: 0.601929
-Joy: 0.047119
-Fear: 0.177971
-Disgust: 0.35354
-Anger: 0.046075
Amazon Web Services
Relevance: 0.946893
Sentiment: 0.32613
Emotions:
-Sadness: 0.088253
-Joy: 0.082979
-Fear: 0.008404
-Disgust: 0.020818
-Anger: 0.0714
lost Lenore
Relevance: 0.501634
Sentiment: -0.568457
Emotions:
-Sadness: 0.764473
-Joy: 0.008086
-Fear: 0.247296
-Disgust: 0.087756
-Anger: 0.197887
Beanie Baby craze
Relevance: 0.891312
Sentiment: -0.558869
Emotions:
-Sadness: 0.724752
-Joy: 0.124824
-Fear: 0.095344
-Disgust: 0.04535
-Anger: 0.09825
trip
Relevance: 0.845717
Sentiment: 0
Emotions:
-Sadness: 0.082996
-Joy: 0.768652
-Fear: 0.055204
-Disgust: 0.02813
-Anger: 0.057443
technical project management
Relevance: 0.918105
Sentiment: 0.922231
Emotions:
-Sadness: 0.239939
-Joy: 0.364821
-Fear: 0.112302
-Disgust: 0.038732
-Anger: 0.208574
separate dying ember
Relevance: 0.482499
Sentiment: -0.494962
Emotions:
-Sadness: 0.727678
-Joy: 0.012482
-Fear: 0.320462
-Disgust: 0.055322
-Anger: 0.080813
small Canadian company
Relevance: 0.875852
Sentiment: 0.53669
Emotions:
-Sadness: 0.081233
-Joy: 0.310396
-Fear: 0.007662
-Disgust: 0.533
-Anger: 0.103954
beach
Relevance: 0.836752
Sentiment: 0
Emotions:
-Sadness: 0.082996
-Joy: 0.768652
-Fear: 0.055204
-Disgust: 0.02813
-Anger: 0.057443
page load times
Relevance: 0.914624
Sentiment: 0.835434
Emotions:
-Sadness: 0.108787
-Joy: 0.448846
-Fear: 0.035503
-Disgust: 0.004488
-Anger: 0.025469
meaninglittle relevancy bore
Relevance: 0.459712
Sentiment: -0.825676
Emotions:
-Sadness: 0.337678
-Joy: 0.154829
-Fear: 0.321075
-Disgust: 0.050412
-Anger: 0.311102
suitably malicious webpage
Relevance: 0.865377
Sentiment: -0.781828
Emotions:
-Sadness: 0.494608
-Joy: 0.107754
-Fear: 0.086254
-Disgust: 0.245535
-Anger: 0.212926
 
agents increasing service
Relevance: 0.9146
Sentiment: 0
Emotions:
-Sadness: 0.2539
-Joy: 0.097912
-Fear: 0.065398
-Disgust: 0.13268
-Anger: 0.096632
lamp-light gloating o’er
Relevance: 0.445707
Sentiment: 0
Emotions:
-Sadness: 0.188522
-Joy: 0.186324
-Fear: 0.182589
-Disgust: 0.200685
-Anger: 0.047279
Ethereum
Relevance: 0.747639
Sentiment: -0.513226
Emotions:
-Sadness: 0.16149
-Joy: 0.191175
-Fear: 0.080751
-Disgust: 0.055043
-Anger: 0.517517
 
complex multi-million-dollar solutions
Relevance: 0.912594
Sentiment: 0.922231
Emotions:
-Sadness: 0.239939
-Joy: 0.364821
-Fear: 0.112302
-Disgust: 0.038732
-Anger: 0.208574
ominous bird
Relevance: 0.442842
Sentiment: 0
Emotions:
-Sadness: 0.21515
-Joy: 0.089418
-Fear: 0.196796
-Disgust: 0.15498
-Anger: 0.209489
legitimate banking
Relevance: 0.744846
Sentiment: 0
Emotions:
-Sadness: 0.205808
-Joy: 0.159109
-Fear: 0.049965
-Disgust: 0.013543
-Anger: 0.14675
 
HPE Helion
Relevance: 0.798603
Sentiment: 0
Emotions:
-Sadness: 0.154141
-Joy: 0.172429
-Fear: 0.095666
-Disgust: 0.076074
-Anger: 0.094764
Raven
Relevance: 0.440981
Sentiment: 0.424863
Emotions:
-Sadness: 0.694532
-Joy: 0.690868
-Fear: 0.687529
-Disgust: 0.0456
-Anger: 0.680811
fashionable cryptocurrencies
Relevance: 0.73987
Sentiment: -0.420752
Emotions:
-Sadness: 0.204408
-Joy: 0.413299
-Fear: 0.058367
-Disgust: 0.092789
-Anger: 0.086215
 
cloud solutions
Relevance: 0.791795
Sentiment: 0.32613
Emotions:
-Sadness: 0.088253
-Joy: 0.082979
-Fear: 0.008404
-Disgust: 0.020818
-Anger: 0.0714
Nevermore
Relevance: 0.42179
Sentiment: 0.436053
Emotions:
-Sadness: 0
-Joy: 0
-Fear: 0
-Disgust: 0
-Anger: 0
imaginary kitten
Relevance: 0.728115
Sentiment: -0.387433
Emotions:
-Sadness: 0.201491
-Joy: 0.487955
-Fear: 0.101491
-Disgust: 0.053981
-Anger: 0.071164
 
Google Adwords
Relevance: 0.764266
Sentiment: 0
Emotions:
-Sadness: 0.191606
-Joy: 0.134409
-Fear: 0.037958
-Disgust: 0.051213
-Anger: 0.064268
gently rapping
Relevance: 0.418179
Sentiment: 0
Emotions:
-Sadness: 0.261638
-Joy: 0.01624
-Fear: 0.253446
-Disgust: 0.49803
-Anger: 0.30956
imaginary kittens
Relevance: 0.725727
Sentiment: 0.53669
Emotions:
-Sadness: 0.081233
-Joy: 0.310396
-Fear: 0.007662
-Disgust: 0.533
-Anger: 0.103954
 
Microsoft Azure
Relevance: 0.75976
Sentiment: 0.32613
Emotions:
-Sadness: 0.088253
-Joy: 0.082979
-Fear: 0.008404
-Disgust: 0.020818
-Anger: 0.0714
midnight dreary
Relevance: 0.4073
Sentiment: -0.722491
Emotions:
-Sadness: 0.079813
-Joy: 0.253367
-Fear: 0.061463
-Disgust: 0.114973
-Anger: 0.038623
large parts
Relevance: 0.725197
Sentiment: 0
Emotions:
-Sadness: 0.205808
-Joy: 0.159109
-Fear: 0.049965
-Disgust: 0.013543
-Anger: 0.14675
 
anti-fraud solution
Relevance: 0.756846
Sentiment: 0
Emotions:
-Sadness: 0.279138
-Joy: 0.240411
-Fear: 0.111477
-Disgust: 0.118215
-Anger: 0.331585
stately Raven
Relevance: 0.404736
Sentiment: -0.368547
Emotions:
-Sadness: 0.203589
-Joy: 0.334872
-Fear: 0.14134
-Disgust: 0.04234
-Anger: 0.071284
gold vanish
Relevance: 0.711218
Sentiment: -0.687961
Emotions:
-Sadness: 0.506992
-Joy: 0.158692
-Fear: 0.133844
-Disgust: 0.156028
-Anger: 0.210273
 
Google Analytics
Relevance: 0.749963
Sentiment: 0
Emotions:
-Sadness: 0.163427
-Joy: 0.163105
-Fear: 0.063402
-Disgust: 0.063864
-Anger: 0.079907
curious volume
Relevance: 0.404517
Sentiment: -0.365095
Emotions:
-Sadness: 0.27674
-Joy: 0.138882
-Fear: 0.145411
-Disgust: 0.033914
-Anger: 0.126784
tradeable value
Relevance: 0.710171
Sentiment: -0.585778
Emotions:
-Sadness: 0.312596
-Joy: 0.257172
-Fear: 0.098399
-Disgust: 0.010844
-Anger: 0.308837
 
cloud-focused company
Relevance: 0.749223
Sentiment: 0.713568
Emotions:
-Sadness: 0.039198
-Joy: 0.585749
-Fear: 0.004257
-Disgust: 0.014587
-Anger: 0.005712
fantastic terrors
Relevance: 0.398956
Sentiment: 0
Emotions:
-Sadness: 0.06198
-Joy: 0.8648
-Fear: 0.004159
-Disgust: 0.006427
-Anger: 0.004621
Spectre flaws
Relevance: 0.708683
Sentiment: 0
Emotions:
-Sadness: 0.337198
-Joy: 0.366092
-Fear: 0.112892
-Disgust: 0.038348
-Anger: 0.095604
 
manager/director role
Relevance: 0.746573
Sentiment: 0.713568
Emotions:
-Sadness: 0.039198
-Joy: 0.585749
-Fear: 0.004257
-Disgust: 0.014587
-Anger: 0.005712
thy crest
Relevance: 0.39861
Sentiment: 0
Emotions:
-Sadness: 0.303813
-Joy: 0.161173
-Fear: 0.234302
-Disgust: 0.051151
-Anger: 0.083632
latest manifestation
Relevance: 0.708348
Sentiment: 0
Emotions:
-Sadness: 0.198911
-Joy: 0.348413
-Fear: 0.209789
-Disgust: 0.013768
-Anger: 0.072017
 
wide range
Relevance: 0.745984
Sentiment: 0.32613
Emotions:
-Sadness: 0.088253
-Joy: 0.082979
-Fear: 0.008404
-Disgust: 0.020818
-Anger: 0.0714
uncertain rustling
Relevance: 0.397321
Sentiment: -0.568999
Emotions:
-Sadness: 0.117186
-Joy: 0.282085
-Fear: 0.03773
-Disgust: 0.065539
-Anger: 0.079408
network briefly
Relevance: 0.70786
Sentiment: -0.322578
Emotions:
-Sadness: 0.185012
-Joy: 0.474092
-Fear: 0.127986
-Disgust: 0.114617
-Anger: 0.171347
 
legal compliance
Relevance: 0.741952
Sentiment: 0
Emotions:
-Sadness: 0.122162
-Joy: 0.18645
-Fear: 0.12973
-Disgust: 0.074024
-Anger: 0.140856
placid bust
Relevance: 0.396545
Sentiment: -0.689534
Emotions:
-Sadness: 0.893423
-Joy: 0.003638
-Fear: 0.140634
-Disgust: 0.085322
-Anger: 0.092297
intense speculation
Relevance: 0.705586
Sentiment: 0.551383
Emotions:
-Sadness: 0.230166
-Joy: 0.424813
-Fear: 0.062365
-Disgust: 0.023201
-Anger: 0.074579
 
web technologies
Relevance: 0.739898
Sentiment: 0
Emotions:
-Sadness: 0.299532
-Joy: 0.296768
-Fear: 0.054391
-Disgust: 0.034434
-Anger: 0.074903
sculptured bust
Relevance: 0.394975
Sentiment: -0.569724
Emotions:
-Sadness: 0.132652
-Joy: 0.083568
-Fear: 0.237606
-Disgust: 0.228994
-Anger: 0.316378
credit card
Relevance: 0.703885
Sentiment: -0.558869
Emotions:
-Sadness: 0.724752
-Joy: 0.124824
-Fear: 0.095344
-Disgust: 0.04535
-Anger: 0.09825
 
world-class customers
Relevance: 0.736487
Sentiment: 0.922231
Emotions:
-Sadness: 0.239939
-Joy: 0.364821
-Fear: 0.112302
-Disgust: 0.038732
-Anger: 0.208574
whispered word
Relevance: 0.394947
Sentiment: 0
Emotions:
-Sadness: 0.084248
-Joy: 0.310103
-Fear: 0.060499
-Disgust: 0.13849
-Anger: 0.190855
sober realism
Relevance: 0.700107
Sentiment: 0
Emotions:
-Sadness: 0.194125
-Joy: 0.324705
-Fear: 0.089495
-Disgust: 0.043107
-Anger: 0.062661
 
Stratoscale Symphony
Relevance: 0.733062
Sentiment: 0.292848
Emotions:
-Sadness: 0.131983
-Joy: 0.191748
-Fear: 0.08108
-Disgust: 0.07167
-Anger: 0.084732
radiant maiden
Relevance: 0.393484
Sentiment: 0.745542
Emotions:
-Sadness: 0.098052
-Joy: 0.587861
-Fear: 0.032495
-Disgust: 0.004877
-Anger: 0.023884
cartoon cats
Relevance: 0.699942
Sentiment: 0.53669
Emotions:
-Sadness: 0.081233
-Joy: 0.310396
-Fear: 0.007662
-Disgust: 0.533
-Anger: 0.103954
 
numerous initiatives
Relevance: 0.729417
Sentiment: 0
Emotions:
-Sadness: 0.072441
-Joy: 0.10439
-Fear: 0.131526
-Disgust: 0.035598
-Anger: 0.056772
thy memories
Relevance: 0.392245
Sentiment: 0
Emotions:
-Sadness: 0.462287
-Joy: 0.351459
-Fear: 0.040361
-Disgust: 0.011831
-Anger: 0.0264
excellent medium
Relevance: 0.699505
Sentiment: 0.53669
Emotions:
-Sadness: 0.081233
-Joy: 0.310396
-Fear: 0.007662
-Disgust: 0.533
-Anger: 0.103954
 

I’d love to read your questions and comments on these services. How do you think you can use these in your business or project?