Category
Technologies
LLM Tutorials
Keep up to date with the latest news, techniques, and resources for Large Language Models. Our tutorials are full of practical walk throughs & use cases you can use to upskill.
Other technologies:
Training 2 or more people?Try DataCamp for Business
How to Run Fable 5-Enhanced Gemma 4 Locally
Learn how to run Fable 5-Enhanced Gemma 4 locally with prebuilt llama.cpp binaries on an RTX 5070 Ti, test MTP speculative decoding, connect Pi for private AI coding-agent workflows, and troubleshoot common setup issues.
Abid Ali Awan
June 25, 2026
How to Run Kimi K2.7 Code Locally Using llama.cpp
Learn how to run Kimi K2.7 Code locally in five minutes with the prebuilt llama.cpp binary on four RTX PRO 6000 GPUs, then use its web UI and Pi coding agent through an OpenAI-compatible API.
Abid Ali Awan
June 24, 2026
How to Run MiniMax M3 Locally: Multi-GPU Setup with llama.cpp and Pi Agent
Learn how to run MiniMax M3 locally on two RTX PRO 6000 GPUs with llama.cpp, test its OpenAI-compatible API and web UI, and connect it to Pi Coding Agent for private, high-speed local coding workflows.
Abid Ali Awan
June 23, 2026
Claude Code Routines: Run Your Coding Agent on a Schedule in the Cloud
Learn how Claude Code routines run your coding agent in the cloud on a schedule or a GitHub event, so PR reviews and audits finish with your laptop closed.
Bex Tuychiev
June 17, 2026
GGUF Format: A Complete Guide to Local LLM Inference
GGUF packages model weights, tokenizer data, and metadata into a single portable file. Learn how to choose the right quantization level and get started with Ollama.
Austin Chia
June 17, 2026
How to Speed Up Local LLMs with DFlash Speculative Decoding
Learn how to accelerate local Gemma 4 31B inference on a single RTX 4090 using DFlash speculative decoding and Flash Attention against a baseline setup.
Abid Ali Awan
June 17, 2026
Claude Opus 4.8 API Tutorial: Tuning the Effort Parameter
Build a Streamlit app that runs Claude Opus 4.8 with adaptive thinking, auto-scores each response with Haiku 4.5, and charts the cost-quality tradeoff.
Aashi Dutt
June 5, 2026
SGLang Tutorial: Serving Mistral Medium 3.5 Locally
Set up a multi-GPU Docker environment with tensor parallelism and EAGLE speculative decoding to serve Mistral Medium 3.5 128B through an OpenAI-compatible API.
Abid Ali Awan
June 1, 2026
Multi-Token Prediction Tutorial: How To Speed Up LLMs
Run Qwen3.6 27B on an RTX 3090 and learn how Multi-Token Prediction (MTP) with llama.cpp can boost local LLM inference by almost 2x without upgrading your GPU.
Abid Ali Awan
May 14, 2026
GPT-Realtime-2 API Tutorial: Three Tests, Three Verdicts
Learn how OpenAI's gpt-realtime-2, gpt-realtime-translate, and gpt-realtime-whisper differ, then test each one with working Python WebSocket code.
Khalid Abdelaty
May 12, 2026
How to Run DeepSeek V4 Flash Locally
Learn how to run the full DeepSeek V4 Flash model on a single GPU using a modified llama.cpp build and a compatible GGUF file in this hands-on tutorial.
Abid Ali Awan
May 5, 2026
DeepSeek V4 API Tutorial: Building a Thinking Mode Arena
Learn how DeepSeek V4's three reasoning modes work, and build a Streamlit comparison app that shows when each mode actually wins on quality, speed, and cost.
Aashi Dutt
April 30, 2026