llm-eval
Here are 43 public repositories matching this topic...
The prompt engineering, prompt management, and prompt evaluation tool for C# and .NET
-
Updated
Jun 16, 2024
The prompt engineering, prompt management, and prompt evaluation tool for Kotlin.
-
Updated
Jun 16, 2024
LLM Security Platform Docs
-
Updated
Apr 9, 2024 - MDX
Community Plugin for Genkit to use Promptfoo
-
Updated
Jan 3, 2025 - TypeScript
The prompt engineering, prompt management, and prompt evaluation tool for Python
-
Updated
Sep 17, 2024 - Python
LLM behavior QA: tone collapse, false consent, and reroute logic scoring.
-
Updated
May 17, 2025
Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities
-
Updated
Jul 13, 2025 - Jupyter Notebook
The prompt engineering, prompt management, and prompt evaluation tool for Ruby.
-
Updated
Jun 16, 2024
Sample project demonstrates how to use Promptfoo, a test framework for evaluating the output of generative AI models
-
Updated
Sep 10, 2024
This project applies the LLM-Eval framework to the PersonaChat dataset to assess response quality in a conversational context. Using GPT-4o-mini via the OpenAI API, the system generates scores (on a 0-5 or 0-100 scale) for four evaluation metrics: context, grammar, relevance, and appropriateness.
-
Updated
Mar 24, 2025 - Python
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
-
Updated
Jun 6, 2023 - Python
Sample implementation demonstrating how to use Firebase Genkit with Promptfoo
-
Updated
Sep 11, 2024 - TypeScript
The prompt engineering, prompt management, and prompt evaluation tool for Go.
-
Updated
Jun 16, 2024
Shin Rakuda is a comprehensive framework for evaluating and benchmarking Japanese large language models, offering researchers and developers a flexible toolkit for assessing LLM performance across diverse datasets.
-
Updated
Sep 17, 2024 - Python
[ACL 2025] GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning
-
Updated
Jul 30, 2025 - Python
The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.
-
Updated
Sep 14, 2024 - TypeScript
Improve this page
Add a description, image, and links to the llm-eval topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llm-eval topic, visit your repo's landing page and select "manage topics."