CS log

Entity and Sentiment Analysis with the Natural Language API 본문

AI/NLP

Entity and Sentiment Analysis with the Natural Language API

sj.cath 2024. 9. 23. 20:15

Overview

The Cloud Natural Language API lets you extract entities from text, perform sentiment and syntactic analysis, and classify text into categories.

In this lab, you learn how to use the Natural Language API to analyze entities, sentiment, and syntax.

 

Objectives

In this lab, you will learn how to:

  • Create a Natural Language API request and calling the API with curl
  • Extract entities and running sentiment analysis on text with the Natural Language API
  • Perform linguistic analysis on text with the Natural Language API
  • Create a Natural Language API request in a different language

 

Task 2. Make an entity analysis request

API key를 만든 후,

{
  "document":{
    "type":"PLAIN_TEXT",
    "content":"Joanne Rowling, who writes under the pen names J. K. Rowling and Robert Galbraith, is a British novelist and screenwriter who wrote the Harry Potter fantasy series."
  },
  "encodingType":"UTF8"
}

 

api를 불러와서 업무를 수행하면, 다음과 같은 결과를 보여준다.

Joanne Rowling, who writes under the pen names J. K. Rowling and Robert Galbraith, is a British novelist and screenwriter who wrote the Harry Potter fantasy series
student-00-ee19cba19d5c@linux-instance:~$ curl "https://language.googleapis.com/v1/documents:analyzeEntities?key=${API_KEY}" \
  -s -X POST -H "Content-Type: application/json" --data-binary @request.json > result.json
student-00-ee19cba19d5c@linux-instance:~$ cat result.json
{
  "entities": [
    {
      "name": "Joanne Rowling",
      "type": "PERSON",
      "metadata": {
        "mid": "/m/042xh",
        "wikipedia_url": "https://en.wikipedia.org/wiki/J._K._Rowling"
      },
      "salience": 0.79828626,
      "mentions": [
        {
          "text": {
            "content": "Joanne Rowling",
            "beginOffset": 0
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "Rowling",
            "beginOffset": 53
          },
          "type": "PROPER"
        },
        {
          "text": {
            "content": "novelist",
            "beginOffset": 96
          },
          "type": "COMMON"
        },
        {
          "text": {
            "content": "Robert Galbraith",
            "beginOffset": 65
          },
          "type": "PROPER"
        }
      ]
    },
    {
      "name": "pen names",
      "type": "OTHER",
      "metadata": {},
      "salience": 0.07300248,
      "mentions": [
        {
          "text": {
            "content": "pen names",
            "beginOffset": 37
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "J.K.",
      "type": "PERSON",
      "metadata": {},
      "salience": 0.043804582,
      "mentions": [
        {
          "text": {
            "content": "J. K.",
            "beginOffset": 47
          },
          "type": "PROPER"
        }
      ]
    },
    {
      "name": "British",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/07ssc",
        "wikipedia_url": "https://en.wikipedia.org/wiki/United_Kingdom"
      },
      "salience": 0.019752095,
      "mentions": [
        {
          "text": {
            "content": "British",
            "beginOffset": 88
          },
          "type": "PROPER"
        }
      ]
    },
    {
      "name": "fantasy series",
      "type": "WORK_OF_ART",
      "metadata": {},
      "salience": 0.01764168,
      "mentions": [
        {
          "text": {
            "content": "fantasy series",
            "beginOffset": 149
          },
          "type": "COMMON"
        }
      ]
    },
    {
      "name": "Harry Potter",
      "type": "WORK_OF_ART",
      "metadata": {
        "wikipedia_url": "https://en.wikipedia.org/wiki/Harry_Potter",
        "mid": "/m/078ffw"
      },
      "salience": 0.014916742,
      "mentions": [
        {
          "text": {
            "content": "Harry Potter",
            "beginOffset": 136
          },
          "type": "PROPER"
        }
      ]
    },
    {
      "name": "screenwriter",
      "type": "PERSON",
      "metadata": {},
      "salience": 0.011085264,
      "mentions": [
        {
          "text": {
            "content": "screenwriter",
            "beginOffset": 109
          },
          "type": "COMMON"
        }
      ]
    }
  ],
  "language": "en"
}
student-00-ee19cba19d5c@linux-instance:~$

The Natural Language API can also recognize the same entity mentioned in different ways. Take a look at the mentions list in the response: ​the API is able to tell that "Joanne Rowling", "Rowling", "novelist" and "Robert Galbriath" all point to the same thing.​

 

 

Task 4. Sentiment analysis with the Natural Language API

Harry Potter is the best book. I think everyone should read it.

위 텍스트에 대해서는 아래와 같이 감정 분석이 가능했다.

student-00-ee19cba19d5c@linux-instance:~$ nano request.json
student-00-ee19cba19d5c@linux-instance:~$ curl "https://language.googleapis.com/v1/documents:analyzeSentiment?key=${API_KEY}"   -s -X POST -H "Content-Type: application/json" --data-binary @request.json
{
  "documentSentiment": {
    "magnitude": 1.9,
    "score": 0.9
  },
  "language": "en",
  "sentences": [
    {
      "text": {
        "content": "Harry Potter is the best book.",
        "beginOffset": 0
      },
      "sentiment": {
        "magnitude": 0.9,
        "score": 0.9
      }
    },
    {
      "text": {
        "content": "I think everyone should read it.",
        "beginOffset": 31
      },
      "sentiment": {
        "magnitude": 0.9,
        "score": 0.9
      }
    }
  ]
}
student-00-ee19cba19d5c@linux-instance:~$

 

Task 5. Analyzing entity sentiment

In addition to providing sentiment details on the entire text document, the Natural Language API can also break down sentiment by the entities in the text. Use this sentence as an example:

I liked the sushi but the service was terrible.

In this case, getting a sentiment score for the entire sentence as you did above might not be so useful. If this was a restaurant review and there were hundreds of reviews for the same restaurant, you'd want to know exactly which things people liked and didn't like in their reviews. Fortunately, the Natural Language API has a method that lets you get the sentiment for each entity in the text, called analyzeEntitySentiment. 

 

 

student-00-ee19cba19d5c@linux-instance:~$ nano request.json
student-00-ee19cba19d5c@linux-instance:~$ curl "https://language.googleapis.com/v1/documents:analyzeEntitySentiment?key=${API_KEY}" \
  -s -X POST -H "Content-Type: application/json" --data-binary @request.json
{
  "entities": [
    {
      "name": "sushi",
      "type": "CONSUMER_GOOD",
      "metadata": {},
      "salience": 0.51064336,
      "mentions": [
        {
          "text": {
            "content": "sushi",
            "beginOffset": 12
          },
          "type": "COMMON",
          "sentiment": {
            "magnitude": 0,
            "score": 0
          }
        }
      ],
      "sentiment": {
        "magnitude": 0,
        "score": 0
      }
    },
    {
      "name": "service",
      "type": "OTHER",
      "metadata": {},
      "salience": 0.48935664,
      "mentions": [
        {
          "text": {
            "content": "service",
            "beginOffset": 26
          },
          "type": "COMMON",
          "sentiment": {
            "magnitude": 0.7,
            "score": -0.7
          }
        }
      ],
      "sentiment": {
        "magnitude": 0.7,
        "score": -0.7
      }
    }
  ],
  "language": "en"
}
student-00-ee19cba19d5c@linux-instance:~$

sushi는 neutral, service는 -0.7을 기록한 것을 볼 수 있다.

 

Task 6. Analyzing syntax and parts of speech

Use syntactic analysis, another of the Natural Language API's methods, to dive deeper into the linguistic details of the text. analyzeSyntax extracts linguistic information, breaking up the given text into a series of sentences and tokens (generally, word boundaries), to provide further analysis on those tokens. For each word in the text, the API tells you the word's part of speech (noun, verb, adjective, etc.) and how it relates to other words in the sentence (Is it the root verb? A modifier?).

"Joanne Rowling is a British novelist, screenwriter and film producer."
student-00-ee19cba19d5c@linux-instance:~$ curl "https://language.googleapis.com/v1/documents:analyzeSyntax?key=${API_KEY}" \
  -s -X POST -H "Content-Type: application/json" --data-binary @request.json
{
  "sentences": [
    {
      "text": {
        "content": "Joanne Rowling is a British novelist, screenwriter and film producer.",
        "beginOffset": 0
      }
    }
  ],
  "tokens": [
    {
      "text": {
        "content": "Joanne",
        "beginOffset": 0
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 1,
        "label": "NN"
      },
      "lemma": "Joanne"
    },
    {
      "text": {
        "content": "Rowling",
        "beginOffset": 7
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "NSUBJ"
      },
      "lemma": "Rowling"
    },
    {
      "text": {
        "content": "is",
        "beginOffset": 15
      },
      "partOfSpeech": {
        "tag": "VERB",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "INDICATIVE",
        "number": "SINGULAR",
        "person": "THIRD",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "PRESENT",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "ROOT"
      },
      "lemma": "be"
    },
    {
      "text": {
        "content": "a",
        "beginOffset": 18
      },
      "partOfSpeech": {
        "tag": "DET",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "NUMBER_UNKNOWN",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 5,
        "label": "DET"
      },
      "lemma": "a"
    },
    {
      "text": {
        "content": "British",
        "beginOffset": 20
      },
      "partOfSpeech": {
        "tag": "ADJ",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "NUMBER_UNKNOWN",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 5,
        "label": "AMOD"
      },
      "lemma": "British"
    },
    {
      "text": {
        "content": "novelist",
        "beginOffset": 28
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "ATTR"
      },
      "lemma": "novelist"
    },
    {
      "text": {
        "content": ",",
        "beginOffset": 36
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "NUMBER_UNKNOWN",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 5,
        "label": "P"
      },
      "lemma": ","
    },
    {
      "text": {
        "content": "screenwriter",
        "beginOffset": 38
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 5,
        "label": "CONJ"
      },
      "lemma": "screenwriter"
    },
    {
      "text": {
        "content": "and",
        "beginOffset": 51
      },
      "partOfSpeech": {
        "tag": "CONJ",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "NUMBER_UNKNOWN",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 5,
        "label": "CC"
      },
      "lemma": "and"
    },
    {
      "text": {
        "content": "film",
        "beginOffset": 55
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 10,
        "label": "NN"
      },
      "lemma": "film"
    },
    {
      "text": {
        "content": "producer",
        "beginOffset": 60
      },
      "partOfSpeech": {
        "tag": "NOUN",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "SINGULAR",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 5,
        "label": "CONJ"
      },
      "lemma": "producer"
    },
    {
      "text": {
        "content": ".",
        "beginOffset": 68
      },
      "partOfSpeech": {
        "tag": "PUNCT",
        "aspect": "ASPECT_UNKNOWN",
        "case": "CASE_UNKNOWN",
        "form": "FORM_UNKNOWN",
        "gender": "GENDER_UNKNOWN",
        "mood": "MOOD_UNKNOWN",
        "number": "NUMBER_UNKNOWN",
        "person": "PERSON_UNKNOWN",
        "proper": "PROPER_UNKNOWN",
        "reciprocity": "RECIPROCITY_UNKNOWN",
        "tense": "TENSE_UNKNOWN",
        "voice": "VOICE_UNKNOWN"
      },
      "dependencyEdge": {
        "headTokenIndex": 2,
        "label": "P"
      },
      "lemma": "."
    }
  ],
  "language": "en"
}
student-00-ee19cba19d5c@linux-instance:~$

  • headTokenIndex is the index of the token that has an arc pointing at "Joanne". Think of each token in the sentence as a word in an array.
  • headTokenIndex of 1 for "Joanne" refers to the word "Rowling", which it is connected to in the tree. The label NN (short for noun compound modifier) describes the word's role in the sentence. "Joanne" modifies "Rowling", the subject of the sentence.
  • lemma is the canonical form of the word. For example, the words run, runs, ran, and running all have a lemma of run. The lemma value is useful for tracking occurrences of a word in a large piece of text over time.

 

Task 7. Multilingual natural language processing

"日本のグーグルのオフィスは、東京の六本木ヒルズにあります"

이번엔 일본어에 적용해보았다.

 

student-00-ee19cba19d5c@linux-instance:~$ curl "https://language.googleapis.com/v1/documents:analyzeEntities?key=${API_KEY}" \
  -s -X POST -H "Content-Type: application/json" --data-binary @request.json
{
  "entities": [
    {
      "name": "日本",
      "type": "LOCATION",
      "metadata": {
        "mid": "/m/03_3d",
        "wikipedia_url": "https://en.wikipedia.org/wiki/Japan"
      },
      "salience": 0.23804513,
      "mentions": [
        {
          "text": {
            "content": "日本",
            "beginOffset": 0
          },
          "type": "PROPER"
        }
      ]
    },
    {
      "name": "グーグル",
      "type": "ORGANIZATION",
      "metadata": {
        "mid": "/m/045c7b",
        "wikipedia_url": "https://en.wikipedia.org/wiki/Google"
      },
      "salience": 0.21214141,
      "mentions": [
        {
          "text": {
            "content": "グーグル",
            "beginOffset": 9
          },
          "type": "PROPER"
        }
      ]
    },
    {
      "name": "六本木ヒルズ",
      "type": "PERSON",
      "metadata": {
        "wikipedia_url": "https://en.wikipedia.org/wiki/Roppongi_Hills",
        "mid": "/m/01r2_k"
      },
      "salience": 0.19418614,
      "mentions": [
        {
          "text": {
            "content": "六本木ヒルズ",
            "beginOffset": 51
          },
          "type": "PROPER"
        }
      ]
    },
    {
      "name": "東京",
      "type": "LOCATION",
      "metadata": {
        "mid": "/g/12lnhn10f",
        "wikipedia_url": "https://de.wikipedia.org/wiki/Tokio"
      },
      "salience": 0.18159479,
      "mentions": [
        {
          "text": {
            "content": "東京",
            "beginOffset": 42
          },
          "type": "PROPER"
        }
      ]
    },
    {
      "name": "オフィス",
      "type": "OTHER",
      "metadata": {},
      "salience": 0.17403255,
      "mentions": [
        {
          "text": {
            "content": "オフィス",
            "beginOffset": 24
          },
          "type": "COMMON"
        }
      ]
    }
  ],
  "language": "ja"
}
student-00-ee19cba19d5c@linux-instance:~$