ข้ามไปยังเนื้อหาหลัก

บ้าน Python

Courses

Multi-Modal Models with Hugging Face

ระดับกลางระดับทักษะ

อัปเดตแล้ว 01/2569

Combine text, images, audio, and video with the latest AI models from Hugging Face, and generate new images and videos!

เริ่มเรียนหลักสูตรฟรี

PythonArtificial Intelligence4 ชม.14 videos45 Exercises3,800 เอ็กซ์พีคำแถลงแสดงความสำเร็จ

สร้างบัญชีฟรีของคุณ

หรือ

เมื่อดำเนินการต่อ คุณยอมรับข้อกำหนดการใช้งานของเรา นโยบายความเป็นส่วนตัวของเรา และยอมรับว่าข้อมูลของคุณจะถูกจัดเก็บไว้ในสหรัฐอเมริกา

เป็นที่ชื่นชอบของผู้เรียนในบริษัทหลายพันแห่ง

ฝึกอบรมบุคคลตั้งแต่ 2 คนขึ้นไป?

ลองใช้ DataCamp for Business

คำอธิบายรายวิชา

Harness the Power of Multi-Modal AI

Dive into the cutting-edge world of multi-modal AI models, where text, images, and speech combine to create powerful applications. Learn how to leverage Hugging Face's vast repository of models that can see, hear, and understand like never before. Whether you're analyzing social media content, building voice assistants, or creating next-generation AI applications, multi-modal models are your gateway to handling diverse data types seamlessly.

Master Essential Multi-Modal Techniques

Explore state-of-the-art models like CLIP for image-text understanding, SpeechT5 for voice synthesis, and the Qwen2 Vision Language model for multi-modal sentiment analysis. Through hands-on exercises, you'll master the techniques used by leading AI companies to build sophisticated multi-modal systems.

Future-Proof Your AI Skills

This course will give you a robust toolkit for handling multi-modal AI tasks. You'll learn to process and combine different data modalities effectively, fine-tune pre-trained models for custom applications, and evaluate and improve model performance across modalities.

ข้อกำหนดเบื้องต้น

Introduction to LLMs in Python

1

Accessing Hugging Face Models and Datasets

Navigate the Hugging Face model hub, transform raw text, audio, and visual data into AI-friendly formats. Learn how to find the latest most popular models for tasks such as text generation and harness the power of pre-built pipelines.

Hugging Face model navigation

50 เอ็กซ์พี

How many models!?

100 เอ็กซ์พี

Finding the most popular text-to-image model

100 เอ็กซ์พี

Preprocessing different modalities

50 เอ็กซ์พี

Text tokenizing

100 เอ็กซ์พี

Image preprocessing

100 เอ็กซ์พี

Audio preprocessing

100 เอ็กซ์พี

Pipeline tasks and evaluations

50 เอ็กซ์พี

Pipeline caption generation

100 เอ็กซ์พี

Passing keyword arguments

100 เอ็กซ์พี

Model evaluation on a custom dataset

100 เอ็กซ์พี

เริ่มบท

2

Unimodal Vision, Audio, and Text Models

Learn to master individual modalities with state-of-the-art models. Dive into computer vision for image classification and segmentation, explore speech recognition and text-to-speech synthesis, and learn effective fine-tuning techniques. Build practical skills with pre-trained models from Hugging Face's transformers library.

Computer vision

50 เอ็กซ์พี

Image classification

100 เอ็กซ์พี

Object detection

100 เอ็กซ์พี

Image background removal

100 เอ็กซ์พี

Fine-tuning computer vision models

50 เอ็กซ์พี

CV fine-tuning: dataset prep

100 เอ็กซ์พี

CV fine-tuning: model classes

100 เอ็กซ์พี

CV fine-tuning: trainer configuration

100 เอ็กซ์พี

Speech recognition and audio generation

50 เอ็กซ์พี

Automatic speech recognition

100 เอ็กซ์พี

Creating speech embeddings

100 เอ็กซ์พี

Audio denoising

100 เอ็กซ์พี

Fine-tuning text-to-speech models

50 เอ็กซ์พี

Fine-tuning a text-to-speech model

100 เอ็กซ์พี

Generating new speech

100 เอ็กซ์พี

เริ่มบท

3

Multi-Modal Models for Classification

Learn to fuse visual, textual, and audio information for richer AI applications. Master techniques like CLIP for zero-shot classification, build sentiment analyzers that see and read, and create emotion detectors that combine facial expressions with voice. Take your AI models beyond single-modality thinking.

Zero-shot image classification

50 เอ็กซ์พี

Zero-shot learning with CLIP

100 เอ็กซ์พี

Automated caption quality assessment

100 เอ็กซ์พี

Multi-modal sentiment analysis

50 เอ็กซ์พี

Prompting Vision Language Models (VLMs)

100 เอ็กซ์พี

Multi-modal sentiment classification with Qwen

100 เอ็กซ์พี

Zero-shot video classification

50 เอ็กซ์พี

Video audio splitting

100 เอ็กซ์พี

Video sentiment analysis with CLIP CLAP

100 เอ็กซ์พี

เริ่มบท

4

Multi-Modal Generation

Transform ideas into reality! Master cutting-edge AI techniques to generate and manipulate visual content using text prompts. Create stunning images, edit photos intelligently, and build powerful question-answering systems for images and documents. Turn your creative vision into digital reality with multi-modal AI.

Visual question-answering (VQA)

50 เอ็กซ์พี

VQA with Vision Language Transformers (ViLTs)

100 เอ็กซ์พี

Document VQA with LayoutLM

100 เอ็กซ์พี

Image editing with diffusion models

50 เอ็กซ์พี

Custom image editing

100 เอ็กซ์พี

Image inpainting

100 เอ็กซ์พี

Video generation

50 เอ็กซ์พี

Build a video!

100 เอ็กซ์พี

Assessing video generation performance

100 เอ็กซ์พี

Congratulations!

50 เอ็กซ์พี

เริ่มบท

Multi-Modal Models with Hugging Face

หลักสูตรเสร็จสมบูรณ์

ได้รับใบรับรองความสำเร็จ

เพิ่มข้อมูลรับรองนี้ลงในโปรไฟล์ LinkedIn, ประวัติย่อ หรือเรซูเม่ของคุณ
แชร์ลงในโซเชียลมีเดียและในรายงานประเมินผลการปฏิบัติงานของคุณลงทะเบียนเลย

In collaboration with

This course is a collaborative effort between DataCamp and Hugging Face.

สำหรับธุรกิจ

ฝึกอบรมบุคคลตั้งแต่ 2 คนขึ้นไป?

ให้ทีมของคุณเข้าถึงแพลตฟอร์ม DataCamp แบบเต็มรูปแบบ รวมถึงฟีเจอร์ทั้งหมด

ในแทร็กต่อไปนี้

Hugging Face Fundamentals

instructors

Sean Benson

Applied AI Research Scientist, Amsterdam University Medical Centers

collaborators

James Chapman

Francesca Donadoni

เข้าร่วมกับ... 19 ล้านผู้เรียน และเริ่ม Multi-Modal Models with Hugging Face วันนี้เลย!

สร้างบัญชีฟรีของคุณ

หรือ

เมื่อดำเนินการต่อ คุณยอมรับข้อกำหนดการใช้งานของเรา นโยบายความเป็นส่วนตัวของเรา และยอมรับว่าข้อมูลของคุณจะถูกจัดเก็บไว้ในสหรัฐอเมริกา

พัฒนาทักษะด้านข้อมูลของคุณด้วย DataCamp for Mobile

พัฒนาทักษะได้ทุกที่ทุกเวลาด้วยคอร์สเรียนบนมือถือและแบบฝึกหัดเขียนโค้ดประจำวัน 5 นาทีของเรา