メインコンテンツへスキップ

コース

Rで始めるテキスト分析

中級スキルレベル

更新日 2023/03

tidy フレームワークを用いて、R でテキストデータを分析する。

コースを無料で開始

RData Manipulation

4時間

15 ビデオ

46 演習

3,850 XP

27,053

修了証明書

何千もの企業の従業員が支持

チームのトレーニングを担当していますか？

Businessをお試しください

コース説明

SNSから商品レビューまで、テキストはマーケティング分析を含むあらゆる領域でますます重要なデータになっています。テキストは低コストで最新性が高いため、他の非構造化データに代わるケースも増えています。ただし、テキストの可能性を最大限に活用するには、テキストの考え方、クレンジング、要約、モデリングを理解する必要があります。本コースでは、最新のtidyツールを使ってテキスト分析を素早く簡単に始めます。テキストの前処理と可視化、感情分析、トピックモデルの実行と解釈を学びます。

前提条件

Introduction to the Tidyverse

1

Wrangling Text

Since text is unstructured data, a certain amount of wrangling is required to get it into a form where you can analyze it. In this chapter, you will learn how to add structure to text by tokenizing, cleaning, and treating text as categorical data.

Text as data

Airline tweets data

Grouped summaries

Counting categorical data

Counting user types

Summarizing user types

Tokenizing and cleaning

Tokenizing and counting

Cleaning and counting

チャプターを開始

2

Visualizing Text

While counts are nice, visualizations are better. In this chapter, you will learn how to apply what you know from ggplot2 to tidy text data.

Plotting word counts

Visualizing complaints

Visualizing non-complaints

Improving word count plots

Adding custom stop words

Visualizing word counts using factors

Faceting word count plots

Counting by product and reordering

Visualizing word counts with facets

Plotting word clouds

Creating a word cloud

Adding a splash of color

チャプターを開始

3

Sentiment Analysis

While word counts and visualizations suggest something about the content, we can do more. In this chapter, we move beyond word counts alone to analyze the sentiment or emotional valence of text.

Sentiment dictionaries

Counting the NRC sentiments

Visualizing the NRC sentiments

Appending dictionaries

Counting sentiment

Visualizing sentiment

Improving sentiment analysis

Practicing reshaping data

Practicing with grouped summaries

Visualizing sentiment by complaint type

チャプターを開始

4

Topic Modeling

In this final chapter, we move beyond word counts to uncover the underlying topics in a collection of documents. We will use a standard topic model known as latent Dirichlet allocation.

Latent Dirichlet allocation

Topics as word probabilities

Summarizing topics

Visualizing topics

Document term matrices

Creating a DTM

Evaluating a DTM as a matrix

Running topic models

Fitting an LDA

Tidying LDA output

Comparing LDA output

Interpreting topics

Naming three topics

Naming four topics

チャプターを開始

Rで始めるテキスト分析

コース完了

修了証明書を取得

この修了書をLinkedInや履歴書、CVに追加しましょう
ソーシャルメディアや人事評価で共有しましょう今すぐ登録

19百万人を超える学習者と共にRで始めるテキスト分析を始めましょう！

DataCamp for Mobileでデータスキルを磨きましょう

モバイルコースと毎日の 5 分間のコーディングチャレンジで、外出先でも進歩できます。