Chuyển đến nội dung chính

Hugging Face Cheat Sheet

Learn the basics of Hugging Face with this beginner-friendly cheat sheet, and explore key resources to help you get started building with open-source AI.
20 thg 1, 2026  · 2 phút đọc

Have this cheat sheet at your fingertips

Download PDF

Hugging Face is an ecosystem for discovering, running, training, and sharing machine learning models and datasets, with a strong emphasis on open-source and reproducibility.

The “core four” libraries are: transformers (models + pipelines), tokenizers (fast tokenization), datasets (data loading/processing), and huggingface_hub (Hub interaction + versioning).

The Hugging Face Hub

The Hub is a Git-backed platform for hosting Models, Datasets, and Spaces (interactive demos), plus Community features for sharing and discovery.

Key definitions

  • A model is a pretrained checkpoint; a tokenizer converts raw text into tokens; a pipeline bundles preprocessing, inference, and postprocessing for a task.
  • A dataset is an Arrow-backed collection of data with splits (train/validation/test).
  • A checkpoint is a saved snapshot of model weights/config; inference means running a trained model on new inputs; a repo is a Git-backed Hub unit storing models/datasets/Spaces.

Model Cards and Dataset Cards

  • A Model Card explains intended use, training data, evaluation, limitations/biases, and licensing.
  • A Dataset Card describes data sources, schema/splits, known issues/biases, ethics, and licensing.

Use cards to assess fitness-for-purpose, risk, and reproducibility.

Where to run inference?

  • Run locally for control, lower latency, and offline use (you manage hardware/dependencies).
  • Use an inference provider for fast setup and scalability (trade control for network latency and usage-based costs).

Workflows

Inference workflows (transformers)

Quickstart: Run inference with a pipeline

from transformers import pipeline

# Create a pipeline by specifying a task and model ID
analyze_sentiment = pipeline(
   "sentiment-analysis",
   model="distilbert-base-uncased-finetuned-sst-2-english"
)

# Run inference on input text
analyze_sentiment("Hugging Face makes NLP workflows easy!")

Text summarization

from transformers import pipeline

# Create a summarization pipeline
summarize_text = pipeline(
   "summarization",
   model="facebook/bart-large-cnn"
)

# Summarize input text
summarize_text("Long document text goes here...")

Document question answering

from transformers import pipeline

# Create a document QA pipeline
answer_question = pipeline(
   "document-question-answering",
   model="impira/layoutlm-document-qa"
)

# Ask a question about a document image
answer_question(
   image="invoice.png",
   question="What is the invoice total?"
)

Run inference manually

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

inputs = tokenizer("Hugging Face is great", return_tensors="pt")

with torch.no_grad():
   outputs = model(**inputs)

outputs.logits.argmax(dim=-1).item()

Data processing workflows (datasets)

Load and slice datasets

from datasets import load_dataset

movie_reviews = load_dataset("imdb")

train_reviews = movie_reviews["train"]
train_reviews[0]

small_sample = train_reviews.select(range(100))

Preprocess a dataset

from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_batch(batch):
   return tokenizer(
       batch["text"],
       truncation=True,
       padding="max_length",
       max_length=256
   )

tokenized_dataset = dataset.map(
   tokenize_batch,
   batched=True,
   remove_columns=["text"]
)

Working with the Hub (huggingface_hub)

Save locally and reload

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

tokenizer.save_pretrained("local_tokenizer")
model.save_pretrained("local_model")

AutoTokenizer.from_pretrained("local_tokenizer")
AutoModelForSequenceClassification.from_pretrained("local_model")

Log in to the Hub

from huggingface_hub import login

login()

Upload (push) a model to the Hub

from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_id = "your-username/my-model"

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

tokenizer.push_to_hub(repo_id)
model.push_to_hub(repo_id)
Chủ đề

Continue your Hugging Face journey

Tracks

Cơ bản về Hugging Face

12 giờ
Tìm kiếm các mô hình AI mã nguồn mở, bộ dữ liệu và ứng dụng mới nhất, phát triển các tác nhân AI và tinh chỉnh các mô hình ngôn ngữ lớn (LLMs) với Hugging Face. Hãy tham gia cộng đồng AI lớn nhất ngay hôm nay!
Xem chi tiếtRight Arrow
Bắt đầu khóa học
Xem thêmRight Arrow
Có liên quan

cheat-sheet

AI Agents Cheat Sheet

Learn the basics of AI agents with this beginner-friendly cheat sheet, and explore resources to help you get started.
Alex Olteanu's photo

Alex Olteanu

Tutorials

Hugging Face Image Classification: A Comprehensive Guide With Examples

Master image classification using Hugging Face with a step-by-step guide on training and deploying models in AI and computer vision.
Zoumana Keita 's photo

Zoumana Keita

Tutorials

What is Hugging Face? The AI Community's Open-Source Oasis

Explore the transformative world of Hugging Face, the AI community's open-source hub for Machine Learning and Natural Language Processing.
Josep Ferrer's photo

Josep Ferrer

Tutorials

Hugging Face's Text Generation Inference Toolkit for LLMs - A Game Changer in AI

A comprehensive guide to Hugging Face Text Generation Inference for self-hosting large language models on local devices.
Josep Ferrer's photo

Josep Ferrer

code-along

Image Classification with Hugging Face

Deep dive into open source computer vision models with Hugging Face and build an image recognition system from scratch.
Priyanka Asnani's photo

Priyanka Asnani

code-along

Using Open Source AI Models with Hugging Face

Deep dive into open source AI, explore the Hugging Face ecosystem, and build an automated image captioning system.
Alara Dirik's photo

Alara Dirik

Xem thêmXem thêm