“LLM evaluation: A live demo”
As test automation engineers, we’ve relied on a bedrock of consistency to test software. We tried our best to isolate and eliminate non-deterministic behaviour from our systems. And now we’re faced with the challenge of testing software that is non-deterministic by design.
In this session, I will demonstrate the inner workings of a Retrieval Augmented Generation (RAG) model, and how you can subject it to automated evaluation by using another LLM-as-a-judge. Once we run the evals, we will prop up the hood and examine the stack traces of the evaluation framework, so that we can debug an unexpected result. We will then subject the RAG model to an indirect injection attack.
Join me in this live demonstration, where we pit one LLM against another, and try to expose a security flaw in the bargain.
Bio
Anupam Krishnamurthy is a software engineer and writer who explores the intersection of testing, AI, and software quality.
