Accéder au contenu principal

Hugging Face Cheat Sheet

Learn the basics of Hugging Face with this beginner-friendly cheat sheet, and explore key resources to help you get started building with open-source AI.
20 janv. 2026  · 2 min lire

Have this cheat sheet at your fingertips

Download PDF

Hugging Face is an ecosystem for discovering, running, training, and sharing machine learning models and datasets, with a strong emphasis on open-source and reproducibility.

The “core four” libraries are: transformers (models + pipelines), tokenizers (fast tokenization), datasets (data loading/processing), and huggingface_hub (Hub interaction + versioning).

The Hugging Face Hub

The Hub is a Git-backed platform for hosting Models, Datasets, and Spaces (interactive demos), plus Community features for sharing and discovery.

Key definitions

  • A model is a pretrained checkpoint; a tokenizer converts raw text into tokens; a pipeline bundles preprocessing, inference, and postprocessing for a task.
  • A dataset is an Arrow-backed collection of data with splits (train/validation/test).
  • A checkpoint is a saved snapshot of model weights/config; inference means running a trained model on new inputs; a repo is a Git-backed Hub unit storing models/datasets/Spaces.

Model Cards and Dataset Cards

  • A Model Card explains intended use, training data, evaluation, limitations/biases, and licensing.
  • A Dataset Card describes data sources, schema/splits, known issues/biases, ethics, and licensing.

Use cards to assess fitness-for-purpose, risk, and reproducibility.

Where to run inference?

  • Run locally for control, lower latency, and offline use (you manage hardware/dependencies).
  • Use an inference provider for fast setup and scalability (trade control for network latency and usage-based costs).

Workflows

Inference workflows (transformers)

Quickstart: Run inference with a pipeline

from transformers import pipeline

# Create a pipeline by specifying a task and model ID
analyze_sentiment = pipeline(
   "sentiment-analysis",
   model="distilbert-base-uncased-finetuned-sst-2-english"
)

# Run inference on input text
analyze_sentiment("Hugging Face makes NLP workflows easy!")

Text summarization

from transformers import pipeline

# Create a summarization pipeline
summarize_text = pipeline(
   "summarization",
   model="facebook/bart-large-cnn"
)

# Summarize input text
summarize_text("Long document text goes here...")

Document question answering

from transformers import pipeline

# Create a document QA pipeline
answer_question = pipeline(
   "document-question-answering",
   model="impira/layoutlm-document-qa"
)

# Ask a question about a document image
answer_question(
   image="invoice.png",
   question="What is the invoice total?"
)

Run inference manually

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

inputs = tokenizer("Hugging Face is great", return_tensors="pt")

with torch.no_grad():
   outputs = model(**inputs)

outputs.logits.argmax(dim=-1).item()

Data processing workflows (datasets)

Load and slice datasets

from datasets import load_dataset

movie_reviews = load_dataset("imdb")

train_reviews = movie_reviews["train"]
train_reviews[0]

small_sample = train_reviews.select(range(100))

Preprocess a dataset

from datasets import load_dataset
from transformers import AutoTokenizer

dataset = load_dataset("imdb")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_batch(batch):
   return tokenizer(
       batch["text"],
       truncation=True,
       padding="max_length",
       max_length=256
   )

tokenized_dataset = dataset.map(
   tokenize_batch,
   batched=True,
   remove_columns=["text"]
)

Working with the Hub (huggingface_hub)

Save locally and reload

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

tokenizer.save_pretrained("local_tokenizer")
model.save_pretrained("local_model")

AutoTokenizer.from_pretrained("local_tokenizer")
AutoModelForSequenceClassification.from_pretrained("local_model")

Log in to the Hub

from huggingface_hub import login

login()

Upload (push) a model to the Hub

from transformers import AutoTokenizer, AutoModelForSequenceClassification

repo_id = "your-username/my-model"

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

tokenizer.push_to_hub(repo_id)
model.push_to_hub(repo_id)
Sujets

Continue your Hugging Face journey

Cursus

Principes fondamentaux de Hugging Face

Découvrez les derniers modèles d'IA open source, ensembles de données et applications, créez des agents IA et affinez les modèles d'apprentissage automatique (LLM) avec Hugging Face. Rejoignez dès aujourd'hui la plus grande communauté dédiée à l'intelligence artificielle.
Afficher les détailsRight Arrow
Commencer le cours

Cours

Travailler avec Hugging Face

2 h
24.3K
Naviguez et utilisez le vaste référentiel de modèles et d'ensembles de données disponibles sur le Hugging Face Hub.
Voir plusRight Arrow
Contenus associés

cheat-sheet

AI Agents Cheat Sheet

Learn the basics of AI agents with this beginner-friendly cheat sheet, and explore resources to help you get started.
Alex Olteanu's photo

Alex Olteanu

Tutoriel

Hugging Face Image Classification: A Comprehensive Guide With Examples

Master image classification using Hugging Face with a step-by-step guide on training and deploying models in AI and computer vision.
Zoumana Keita 's photo

Zoumana Keita

Tutoriel

What is Hugging Face? The AI Community's Open-Source Oasis

Explore the transformative world of Hugging Face, the AI community's open-source hub for Machine Learning and Natural Language Processing.
Josep Ferrer's photo

Josep Ferrer

Tutoriel

Hugging Face's Text Generation Inference Toolkit for LLMs - A Game Changer in AI

A comprehensive guide to Hugging Face Text Generation Inference for self-hosting large language models on local devices.
Josep Ferrer's photo

Josep Ferrer

code-along

Image Classification with Hugging Face

Deep dive into open source computer vision models with Hugging Face and build an image recognition system from scratch.
Priyanka Asnani's photo

Priyanka Asnani

code-along

Using Open Source AI Models with Hugging Face

Deep dive into open source AI, explore the Hugging Face ecosystem, and build an automated image captioning system.
Alara Dirik's photo

Alara Dirik

Voir plusVoir plus