Skip to main content

Evaluating LLM Responses

In this session, we cover the different evaluations that are useful for reducing hallucination and improving retrieval quality of LLMs.
Nov 29, 2023
Code along with us onCode Along

View Slides

LLMs should be considered hallucinatory until proven otherwise! A lot of us have turned to augmenting LLMs with a knowledge store (such as Zilliz) to solve this problem. But this RAG setup can still face issues with hallucination. In particular - this can be caused from retrieving irrelevant context, not enough context, and more.

TruLens is built to solve this problem. TruLens sits as the evaluation layer for the LLM stack, allowing you to shorten the feedback loop and iterate on your LLM app faster. We'll also talk about the different metrics you can use for evaluation and why you should consider LLM-based evals when building your app.

Key Takeaways:

  • Learn about common failure modes for LLM apps
  • Learn the different evaluations that are useful for reducing hallucination, improving retrieval quality & more.
  • Learn about how to evaluate LLM apps with TruLens

Additional Resources

TruLens Documentation

TruLens GitHub

Find the prompts used for LLM-based feedback functions in TruLens' open-source github repository here.

[SKILL TRACK] AI Fundamentals

[COURSE] Working with the OpenAI API

[TUTORIAL] How to Build LLM Applications with LangChain

Topics
Related

tutorial

HumanEval: A Benchmark for Evaluating LLM Code Generation Capabilities

Learn how to evaluate your LLM on code generation capabilities with the Hugging Face Evaluate library.
Abid Ali Awan's photo

Abid Ali Awan

9 min

tutorial

Evaluating LLMs with MLflow: A Practical Beginner’s Guide

Learn how to streamline your LLM evaluations with MLflow. This guide covers MLflow setup, logging metrics, tracking experiment versions, and comparing models to make informed decisions for optimized LLM performance!
Maria Eugenia Inzaugarat's photo

Maria Eugenia Inzaugarat

27 min

tutorial

Boost LLM Accuracy with Retrieval Augmented Generation (RAG) and Reranking

Discover the strengths of LLMs with effective information retrieval mechanisms. Implement a reranking approach and incorporate it into your own LLM pipeline.
Iván Palomares Carrascosa's photo

Iván Palomares Carrascosa

11 min

tutorial

LLM Classification: How to Select the Best LLM for Your Application

Discover the family of LLMs available and the elements to consider when evaluating which LLM is the best for your use case.
Andrea Valenzuela's photo

Andrea Valenzuela

15 min

code-along

Understanding LLMs for Code Generation

Explore the role of LLMs for coding tasks, focusing on hands-on examples that demonstrate effective prompt engineering techniques to optimize code generation.
Andrea Valenzuela's photo

Andrea Valenzuela

code-along

Retrieval Augmented Generation with LlamaIndex

In this session you'll learn how to get started with Chroma and perform Q&A on some documents using Llama 2, the RAG technique, and LlamaIndex.
Dan Becker's photo

Dan Becker

See MoreSee More