Skip to main content
HomeCode-alongsArtificial Intelligence (AI)

Evaluating LLM Responses

In this session, we cover the different evaluations that are useful for reducing hallucination and improving retrieval quality of LLMs.
Nov 2023
Code along with us onCode Along

View Slides

LLMs should be considered hallucinatory until proven otherwise! A lot of us have turned to augmenting LLMs with a knowledge store (such as Zilliz) to solve this problem. But this RAG setup can still face issues with hallucination. In particular - this can be caused from retrieving irrelevant context, not enough context, and more.

TruLens is built to solve this problem. TruLens sits as the evaluation layer for the LLM stack, allowing you to shorten the feedback loop and iterate on your LLM app faster. We'll also talk about the different metrics you can use for evaluation and why you should consider LLM-based evals when building your app.

Key Takeaways:

  • Learn about common failure modes for LLM apps
  • Learn the different evaluations that are useful for reducing hallucination, improving retrieval quality & more.
  • Learn about how to evaluate LLM apps with TruLens

Additional Resources

TruLens Documentation

TruLens GitHub

Find the prompts used for LLM-based feedback functions in TruLens' open-source github repository here.

[SKILL TRACK] AI Fundamentals

[COURSE] Working with the OpenAI API

[TUTORIAL] How to Build LLM Applications with LangChain

Topics
Related

blog

What is OpenAI's GPT-4o? Launch Date, How it Works, Use Cases & More

Discover OpenAI's GPT-4o and learn about its launch date, unique features, capabilities, cost, and practical use cases.
Richie Cotton's photo

Richie Cotton

6 min

blog

AI Ethics: An Introduction

AI Ethics is the field that studies how to develop and use artificial intelligence in a way that is fair, accountable, transparent, and respects human values.
Vidhi Chugh's photo

Vidhi Chugh

9 min

podcast

The 2nd Wave of Generative AI with Sailesh Ramakrishnan & Madhu Iyer, Managing Partners at Rocketship.vc

Richie, Madhu and Sailesh explore the generative AI revolution, the impact of genAI across industries, investment philosophy and data-driven decision-making, the challenges and opportunities when investing in AI, future trends and predictions, and much more.
Richie Cotton's photo

Richie Cotton

51 min

tutorial

Troubleshooting The No module named 'sklearn' Error Message in Python

Learn how to quickly fix the ModuleNotFoundError: No module named 'sklearn' exception with our detailed, easy-to-follow online guide.
Amberle McKee's photo

Amberle McKee

5 min

tutorial

Phi-3 Tutorial: Hands-On With Microsoft’s Smallest AI Model

A complete guide to exploring Microsoft’s Phi-3 language model, its architecture, features, and application, along with the process of installation, setup, integration, optimization, and fine-tuning the model.
Zoumana Keita 's photo

Zoumana Keita

14 min

tutorial

How to Use the Stable Diffusion 3 API

Learn how to use the Stable Diffusion 3 API for image generation with practical steps and insights on new features and enhancements.
Kurtis Pykes 's photo

Kurtis Pykes

12 min

See MoreSee More