Ir al contenido principal

Curso

Introducción al procesamiento del lenguaje natural en R

IntermedioNivel de habilidad

Actualizado 5/2024

Obtén una visión general de todas las habilidades y herramientas necesarias para destacar en el procesamiento del lenguaje natural en R.

Comienza el curso gratis

RMachine Learning

4 h

15 vídeos

47 Ejercicios

3,750 XP

8,547

Certificado de logros

Preferido por estudiantes en miles de empresas

¿Formando un equipo?

Prueba para empresas

Descripción del curso

Como en cualquier curso de fundamentos, Introducción al procesamiento del lenguaje natural en R está diseñado para darte las herramientas necesarias para empezar a analizar texto. El procesamiento del lenguaje natural (NLP) es un campo en constante crecimiento dentro de la ciencia de datos, con avances muy interesantes en la última década. En este curso cubriremos los conceptos básicos y te prepararemos para ampliar tus capacidades de análisis. Veremos expresiones regulares, modelado de temas, reconocimiento de entidades con nombre y otros, todo con ejemplos detallados que puedes usar para poner en marcha tus futuros análisis.

Requisitos previos

Intermediate R Introduction to the Tidyverse

1

True Fundamentals

Chapter 1 of Introduction to Natural Langauge Processing prepares you for running your first analysis on text. You will explore regular expressions and tokenization, two of the most common components of most analysis tasks. With regular expressions, you can search for any pattern you can think of, and with tokenization, you can prepare and clean text for more sophisticated analysis. This chapter is necessary for tackling the techniques we will learn in the remaining chapters of this course.

Regular expression basics

Practicing syntax with grep

Exploring regular expression functions.

Tokenization

tidytext functions

Tokenization: sentences

Text cleaning basics

Text preprocessing: remove stop words

Text preprocessing: Stemming

Iniciar capítulo

2

Representations of Text

In this chapter, you will learn the most common and studied ways to analyze text. You will look at creating a text corpus, expanding a bag-of-words representation into a TFIDF matrix, and use cosine-similarity metrics to determine how similar two pieces of text are to each other. You build on your foundations for practicing NLP before you dive into applications of NLP in chapters 3 and 4.

Understanding an R corpus

Explore an R corpus

Creating a tibble from a corpus

Creating a corpus

The bag-of-words representation

Practice BoW

BoW Example

Sparse matrices

Manual calculations

TFIDF Practice

Cosine Similarity

An example of failing at text analysis

Cosine similarity example

Iniciar capítulo

3

Applications: Classification and Topic Modeling

Chapter 3 focuses on two common text analysis approaches, classification modeling, and topic modeling. If you are working on text analysis projects, you will inevitably use one or both of these methods. This chapter teaches you how to perform both techniques and provides insight into how to approach these techniques from a practical point of you.

Preparing text for modeling

Data preparation

Removing sparse terms

Classification modeling

Classification modeling example

Confusion matrices

TFIDF tibble vs dtm

Introduction to topic modeling

LDA practice

Assigning topics to documents

LDA in practice

Testing perplexity

Reviewing LDA results

Iniciar capítulo

4

Advanced Techniques

In chapter 4 we cover two staples of natural language processing, sentiment analysis, and word embeddings. These are two analysis techniques that are a must for anyone learning the fundamentals of text analysis. Furthermore, you will briefly learn about BERT, part-of-speech tagging, and named entity recognition. Almost 15 different analysis techniques were covered in this course, so chapter 4 ends by recapping all of the great techniques you will learn about in this course.

Sentiment analysis

tidytext lexicons

Sentiment scores

Sentiment and emotion

Word embeddings

h2o practice

Additional NLP analysis

Reviewing methods #1

Review methods #2

Iniciar capítulo

Introducción al procesamiento del lenguaje natural en R

Curso
completo

Obtener certificado de logros

Añade esta certificación a tu perfil de LinkedIn o a tu currículum.
Compártelo en redes sociales y en tu evaluación de desempeño.Inscríbete ahora

¡Únete a 19 millones de estudiantes y empieza Introducción al procesamiento del lenguaje natural en R hoy mismo!

Desarrolla tus habilidades de datos con la aplicación móvil de DataCamp

Progresa desde cualquier dispositivo móvil con nuestros cursos y desafíos de programación diarios de 5 minutos.