Skip to main content
HomeCode-alongsArtificial Intelligence (AI)

Evaluating LLM Responses

In this session, we cover the different evaluations that are useful for reducing hallucination and improving retrieval quality of LLMs.
Nov 2023
Code along with us onCode Along

View Slides

LLMs should be considered hallucinatory until proven otherwise! A lot of us have turned to augmenting LLMs with a knowledge store (such as Zilliz) to solve this problem. But this RAG setup can still face issues with hallucination. In particular - this can be caused from retrieving irrelevant context, not enough context, and more.

TruLens is built to solve this problem. TruLens sits as the evaluation layer for the LLM stack, allowing you to shorten the feedback loop and iterate on your LLM app faster. We'll also talk about the different metrics you can use for evaluation and why you should consider LLM-based evals when building your app.

Key Takeaways:

  • Learn about common failure modes for LLM apps
  • Learn the different evaluations that are useful for reducing hallucination, improving retrieval quality & more.
  • Learn about how to evaluate LLM apps with TruLens

Additional Resources

TruLens Documentation

TruLens GitHub

Find the prompts used for LLM-based feedback functions in TruLens' open-source github repository here.

[SKILL TRACK] AI Fundamentals

[COURSE] Working with the OpenAI API

[TUTORIAL] How to Build LLM Applications with LangChain



Attention Mechanism in LLMs: An Intuitive Explanation

Learn how the attention mechanism works and how it revolutionized natural language processing (NLP).
Yesha Shastri's photo

Yesha Shastri

8 min


Boost LLM Accuracy with Retrieval Augmented Generation (RAG) and Reranking

Discover the strengths of LLMs with effective information retrieval mechanisms. Implement a reranking approach and incorporate it into your own LLM pipeline.
Iván Palomares Carrascosa's photo

Iván Palomares Carrascosa

11 min


LLM Classification: How to Select the Best LLM for Your Application

Discover the family of LLMs available and the elements to consider when evaluating which LLM is the best for your use case.
Andrea Valenzuela's photo

Andrea Valenzuela

15 min


An Introduction to Debugging And Testing LLMs in LangSmith

Discover how LangSmith optimizes LLM testing and debugging for AI applications. Enhance quality assurance and streamline development with real-world examples.
Bex Tuychiev's photo

Bex Tuychiev

12 min


Retrieval Augmented Generation with LlamaIndex

In this session you'll learn how to get started with Chroma and perform Q&A on some documents using Llama 2, the RAG technique, and LlamaIndex.
Dan Becker's photo

Dan Becker


Retrieval Augmented Generation with the OpenAI API & Pinecone

Build a movie recommender system using GPT and learn key techniques to minimize hallucinations and ensure factual answers.
Vincent Vankrunkelsven's photo

Vincent Vankrunkelsven

See MoreSee More