Skip to main content
Category
Technologies

LLM Tutorials

Keep up to date with the latest news, techniques, and resources for Large Language Models. Our tutorials are full of practical walk throughs & use cases you can use to upskill.
Other technologies:
AI AgentsAI NewsArtificial IntelligenceAWSAzureBusiness IntelligenceChatGPTDatabricksdbtDockerExcelGenerative AIGitGoogle Cloud PlatformHugging FaceJavaJuliaKafkaKubernetesMongoDBMySQLNoSQLOpenAIPostgreSQLPower BIPySparkPythonRScalaSnowflakeSpreadsheetsSQLSQLiteTableau
GroupTraining 2 or more people?Try DataCamp for Business

How to Run Fable 5-Enhanced Gemma 4 Locally

Learn how to run Fable 5-Enhanced Gemma 4 locally with prebuilt llama.cpp binaries on an RTX 5070 Ti, test MTP speculative decoding, connect Pi for private AI coding-agent workflows, and troubleshoot common setup issues.
Abid Ali Awan's photo

Abid Ali Awan

June 25, 2026

How to Run Kimi K2.7 Code Locally Using llama.cpp

Learn how to run Kimi K2.7 Code locally in five minutes with the prebuilt llama.cpp binary on four RTX PRO 6000 GPUs, then use its web UI and Pi coding agent through an OpenAI-compatible API.
Abid Ali Awan's photo

Abid Ali Awan

June 24, 2026

How to Run MiniMax M3 Locally: Multi-GPU Setup with llama.cpp and Pi Agent

Learn how to run MiniMax M3 locally on two RTX PRO 6000 GPUs with llama.cpp, test its OpenAI-compatible API and web UI, and connect it to Pi Coding Agent for private, high-speed local coding workflows.
Abid Ali Awan's photo

Abid Ali Awan

June 23, 2026

Claude Code Routines: Run Your Coding Agent on a Schedule in the Cloud

Learn how Claude Code routines run your coding agent in the cloud on a schedule or a GitHub event, so PR reviews and audits finish with your laptop closed.
Bex Tuychiev's photo

Bex Tuychiev

June 17, 2026

How to Speed Up Local LLMs with DFlash Speculative Decoding

Learn how to accelerate local Gemma 4 31B inference on a single RTX 4090 using DFlash speculative decoding and Flash Attention against a baseline setup.
Abid Ali Awan's photo

Abid Ali Awan

June 17, 2026

Claude Opus 4.8 API Tutorial: Tuning the Effort Parameter

Build a Streamlit app that runs Claude Opus 4.8 with adaptive thinking, auto-scores each response with Haiku 4.5, and charts the cost-quality tradeoff.
Aashi Dutt's photo

Aashi Dutt

June 5, 2026

SGLang Tutorial: Serving Mistral Medium 3.5 Locally

Set up a multi-GPU Docker environment with tensor parallelism and EAGLE speculative decoding to serve Mistral Medium 3.5 128B through an OpenAI-compatible API.
Abid Ali Awan's photo

Abid Ali Awan

June 1, 2026

Multi-Token Prediction Tutorial: How To Speed Up LLMs

Run Qwen3.6 27B on an RTX 3090 and learn how Multi-Token Prediction (MTP) with llama.cpp can boost local LLM inference by almost 2x without upgrading your GPU.
Abid Ali Awan's photo

Abid Ali Awan

May 14, 2026

GPT-Realtime-2 API Tutorial: Three Tests, Three Verdicts

Learn how OpenAI's gpt-realtime-2, gpt-realtime-translate, and gpt-realtime-whisper differ, then test each one with working Python WebSocket code.
Khalid Abdelaty's photo

Khalid Abdelaty

May 12, 2026

How to Run DeepSeek V4 Flash Locally

Learn how to run the full DeepSeek V4 Flash model on a single GPU using a modified llama.cpp build and a compatible GGUF file in this hands-on tutorial.
Abid Ali Awan's photo

Abid Ali Awan

May 5, 2026

DeepSeek V4 API Tutorial: Building a Thinking Mode Arena

Learn how DeepSeek V4's three reasoning modes work, and build a Streamlit comparison app that shows when each mode actually wins on quality, speed, and cost.
Aashi Dutt's photo

Aashi Dutt

April 30, 2026