Skip to main content

Hugging Face Cheat Sheet

Learn the basics of Hugging Face with this beginner-friendly cheat sheet, and explore key resources to help you get started building with open-source AI.
Jan 20, 2026  · 2 min read

Have this cheat sheet at your fingertips

Download PDF

Hugging Face is an ecosystem for discovering, running, training, and sharing machine learning models and datasets, with a strong emphasis on open-source and reproducibility.

The “core four” libraries are: transformers (models + pipelines), tokenizers (fast tokenization), datasets (data loading/processing), and huggingface_hub (Hub interaction + versioning).

The Hugging Face Hub

The Hub is a Git-backed platform for hosting Models, Datasets, and Spaces (interactive demos), plus Community features for sharing and discovery.

Key definitions

  • A model is a pretrained checkpoint; a tokenizer converts raw text into tokens; a pipeline bundles preprocessing, inference, and postprocessing for a task.
  • A dataset is an Arrow-backed collection of data with splits (train/validation/test).
  • A checkpoint is a saved snapshot of model weights/config; inference means running a trained model on new inputs; a repo is a Git-backed Hub unit storing models/datasets/Spaces.

Model Cards and Dataset Cards

  • A Model Card explains intended use, training data, evaluation, limitations/biases, and licensing.
  • A Dataset Card describes data sources, schema/splits, known issues/biases, ethics, and licensing.

Use cards to assess fitness-for-purpose, risk, and reproducibility.

Where to run inference?

  • Run locally for control, lower latency, and offline use (you manage hardware/dependencies).
  • Use an inference provider for fast setup and scalability (trade control for network latency and usage-based costs).

Workflows

Inference workflows (transformers)

Quickstart: Run inference with a pipeline

from transformers import pipeline

# Create a pipeline by specifying a task and model ID
analyze_sentiment = pipeline(
   "sentiment-analysis",
   model="distilbert-base-uncased-finetuned-sst-2-english"
)

# Run inference on input text
analyze_sentiment("Hugging Face makes NLP workflows easy!")

Text summarization

from transformers import pipeline

# Create a summarization pipeline
summarize_text = pipeline(
   "summarization",
   model="facebook/bart-large-cnn"
)

# Summarize input text
summarize_text("Long document text goes here...")

Document question answering

from transformers import pipeline

# Create a document QA pipeline
answer_question = pipeline(
   "document-question-answering",
   model="impira/layoutlm-document-qa"
)

# Ask a question about a document image
answer_question(
   image="invoice.png",
   question="What is the invoice total?"
)

Run inference manually

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

inputs = tokenizer("Hugging Face is great", return_tensors="pt")

with torch.no_grad():
   outputs = model(**inputs)

outputs.logits.argmax(dim=-1).item()

Data processing workflows (datasets)

Load and slice datasets

from datasets import load_dataset

movie_reviews = load_dataset("imdb")

train_reviews = movie_reviews["train"]
train_reviews[0]

small_sample = train_reviews.select(range(100))

Preprocess a dataset

from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_batch(batch):
   return tokenizer(
       batch["text"],
       truncation=True,
       padding="max_length",
       max_length=256
   )

tokenized_dataset = dataset.map(
   tokenize_batch,
   batched=True,
   remove_columns=["text"]
)

Working with the Hub (huggingface_hub)

Save locally and reload

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

tokenizer.save_pretrained("local_tokenizer")
model.save_pretrained("local_model")

AutoTokenizer.from_pretrained("local_tokenizer")
AutoModelForSequenceClassification.from_pretrained("local_model")

Log in to the Hub

from huggingface_hub import login

login()

Upload (push) a model to the Hub

from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_id = "your-username/my-model"

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

tokenizer.push_to_hub(repo_id)
model.push_to_hub(repo_id)
Topics

Continue your Hugging Face journey

Track

Hugging Face Fundamentals

12 hr
Find the latest open-source AI models, datasets, and apps, build AI agents, and fine-tune LLMs with Hugging Face. Join the biggest AI community today!
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

cheat-sheet

AI Agents Cheat Sheet

Learn the basics of AI agents with this beginner-friendly cheat sheet, and explore resources to help you get started.
Alex Olteanu's photo

Alex Olteanu

Tutorial

Hugging Face Image Classification: A Comprehensive Guide With Examples

Master image classification using Hugging Face with a step-by-step guide on training and deploying models in AI and computer vision.
Zoumana Keita 's photo

Zoumana Keita

Tutorial

What is Hugging Face? The AI Community's Open-Source Oasis

Explore the transformative world of Hugging Face, the AI community's open-source hub for Machine Learning and Natural Language Processing.
Josep Ferrer's photo

Josep Ferrer

Tutorial

Hugging Face's Text Generation Inference Toolkit for LLMs - A Game Changer in AI

A comprehensive guide to Hugging Face Text Generation Inference for self-hosting large language models on local devices.
Josep Ferrer's photo

Josep Ferrer

code-along

Image Classification with Hugging Face

Deep dive into open source computer vision models with Hugging Face and build an image recognition system from scratch.
Priyanka Asnani's photo

Priyanka Asnani

code-along

Using Open Source AI Models with Hugging Face

Deep dive into open source AI, explore the Hugging Face ecosystem, and build an automated image captioning system.
Alara Dirik's photo

Alara Dirik

See MoreSee More