Hugging Face Cheat Sheet

Learn the basics of Hugging Face with this beginner-friendly cheat sheet, and explore key resources to help you get started building with open-source AI.

Jan 20, 2026 · 2 min read

Have this cheat sheet at your fingertips

Download PDF

Hugging Face is an ecosystem for discovering, running, training, and sharing machine learning models and datasets, with a strong emphasis on open-source and reproducibility.

The “core four” libraries are: transformers (models + pipelines), tokenizers (fast tokenization), datasets (data loading/processing), and huggingface_hub (Hub interaction + versioning).

The Hugging Face Hub

The Hub is a Git-backed platform for hosting Models, Datasets, and Spaces (interactive demos), plus Community features for sharing and discovery.

Key definitions

A model is a pretrained checkpoint; a tokenizer converts raw text into tokens; a pipeline bundles preprocessing, inference, and postprocessing for a task.
A dataset is an Arrow-backed collection of data with splits (train/validation/test).
A checkpoint is a saved snapshot of model weights/config; inference means running a trained model on new inputs; a repo is a Git-backed Hub unit storing models/datasets/Spaces.

Model Cards and Dataset Cards

A Model Card explains intended use, training data, evaluation, limitations/biases, and licensing.
A Dataset Card describes data sources, schema/splits, known issues/biases, ethics, and licensing.

Use cards to assess fitness-for-purpose, risk, and reproducibility.

Where to run inference?

Run locally for control, lower latency, and offline use (you manage hardware/dependencies).
Use an inference provider for fast setup and scalability (trade control for network latency and usage-based costs).

Workflows

Inference workflows (transformers)

Quickstart: Run inference with a pipeline

from transformers import pipeline

# Create a pipeline by specifying a task and model ID
analyze_sentiment = pipeline(
   "sentiment-analysis",
   model="distilbert-base-uncased-finetuned-sst-2-english"
)

# Run inference on input text
analyze_sentiment("Hugging Face makes NLP workflows easy!")

Text summarization

from transformers import pipeline

# Create a summarization pipeline
summarize_text = pipeline(
   "summarization",
   model="facebook/bart-large-cnn"
)

# Summarize input text
summarize_text("Long document text goes here...")

Document question answering

from transformers import pipeline

# Create a document QA pipeline
answer_question = pipeline(
   "document-question-answering",
   model="impira/layoutlm-document-qa"
)

# Ask a question about a document image
answer_question(
   image="invoice.png",
   question="What is the invoice total?"
)

Run inference manually

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

inputs = tokenizer("Hugging Face is great", return_tensors="pt")

with torch.no_grad():
   outputs = model(**inputs)

outputs.logits.argmax(dim=-1).item()

Data processing workflows (datasets)

Load and slice datasets

from datasets import load_dataset

movie_reviews = load_dataset("imdb")

train_reviews = movie_reviews["train"]
train_reviews[0]

small_sample = train_reviews.select(range(100))

Preprocess a dataset

from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_batch(batch):
   return tokenizer(
       batch["text"],
       truncation=True,
       padding="max_length",
       max_length=256
   )

tokenized_dataset = dataset.map(
   tokenize_batch,
   batched=True,
   remove_columns=["text"]
)

Working with the Hub (huggingface_hub)

Save locally and reload

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

tokenizer.save_pretrained("local_tokenizer")
model.save_pretrained("local_model")

AutoTokenizer.from_pretrained("local_tokenizer")
AutoModelForSequenceClassification.from_pretrained("local_model")

Log in to the Hub

from huggingface_hub import login

login()

Upload (push) a model to the Hub

from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_id = "your-username/my-model"

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

tokenizer.push_to_hub(repo_id)
model.push_to_hub(repo_id)

Topics

Hugging Face

Artificial Intelligence

Continue your Hugging Face journey

Track

Hugging Face Fundamentals

12 hr

Find the latest open-source AI models, datasets, and apps, build AI agents, and fine-tune LLMs with Hugging Face. Join the biggest AI community today!

See Details

Start Course

Course

Working with Hugging Face

2 hr

25.1K

Navigate and use the extensive repository of models and datasets available on the Hugging Face Hub.

See Details

Start Course

Course

Multi-Modal Models with Hugging Face

4 hr

1.3K

Combine text, images, audio, and video with the latest AI models from Hugging Face, and generate new images and videos!

See Details

Start Course

cheat-sheet

AI Agents Cheat Sheet

Learn the basics of AI agents with this beginner-friendly cheat sheet, and explore resources to help you get started.

Alex Olteanu

Tutorial

Hugging Face Image Classification: A Comprehensive Guide With Examples

Master image classification using Hugging Face with a step-by-step guide on training and deploying models in AI and computer vision.

Zoumana Keita

Tutorial

What is Hugging Face? The AI Community's Open-Source Oasis

Explore the transformative world of Hugging Face, the AI community's open-source hub for Machine Learning and Natural Language Processing.

Josep Ferrer

Tutorial

Hugging Face's Text Generation Inference Toolkit for LLMs - A Game Changer in AI

A comprehensive guide to Hugging Face Text Generation Inference for self-hosting large language models on local devices.

Josep Ferrer

code-along

Image Classification with Hugging Face

Deep dive into open source computer vision models with Hugging Face and build an image recognition system from scratch.

Priyanka Asnani

code-along

Using Open Source AI Models with Hugging Face

Deep dive into open source AI, explore the Hugging Face ecosystem, and build an automated image captioning system.

Alara Dirik

See More See More

The Hugging Face Hub

Key definitions

Model Cards and Dataset Cards

Where to run inference?

Workflows

Inference workflows (transformers)

Quickstart: Run inference with a pipeline

Text summarization

Document question answering

Run inference manually

Data processing workflows (datasets)

Load and slice datasets

Preprocess a dataset

Working with the Hub (huggingface_hub)

Save locally and reload

Log in to the Hub

Upload (push) a model to the Hub

AI Agents Cheat Sheet

Hugging Face Image Classification: A Comprehensive Guide With Examples

What is Hugging Face? The AI Community's Open-Source Oasis

Hugging Face's Text Generation Inference Toolkit for LLMs - A Game Changer in AI

Image Classification with Hugging Face

Using Open Source AI Models with Hugging Face

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Hugging Face Fundamentals

Working with Hugging Face

Multi-Modal Models with Hugging Face

AI Agents Cheat Sheet

Hugging Face Image Classification: A Comprehensive Guide With Examples

What is Hugging Face? The AI Community's Open-Source Oasis

Hugging Face's Text Generation Inference Toolkit for LLMs - A Game Changer in AI

Image Classification with Hugging Face

Using Open Source AI Models with Hugging Face

Hugging Face Fundamentals