Kiln v0.12.1 Released

Evals: How to Build your own LLM Evaluator

Our new update includes our biggest feature to date: Evals!

The short version: Kiln now includes a comprehensive evaluation toolkit. With it you can build powerful SOTA evals, check eval correlation to human preferences, synthetically generate an eval dataset, and use analysis tools to find the optimal way to run your task. It will automatically build an eval for any Kiln task, and includes templates for common eval use cases (bias, toxicity, jailbreaking, etc).

Read the docs to learn more, or check out our video walkthrough.

Other New Features

Support for distilling (fine-tuning) an open model from Sonnet 3.7 Thinking
New Built-In Models: Sonnet 3.7, Dolphin 2.9 8x22B, and Grok
New ARM builds for Linux
Improved logging (thanks to @leonardmq)

Kiln News & Community

Kiln Reaches 3K GitHub Stars and 10K Downloads: Thanks for the incredible support (we're not even at V1 yet)! Please continue to share the project with people who might enjoy it, and add a star on Github if you haven't already.
Join Our Discord Community: Please join to ask questions about Kiln. Feedback on evals and ideas on what we should add next are greatly encouraged!

Thanks for all your support!
Steve - The Kiln Maintainer

Kiln v0.12.1 Released

Powerful Evaluation Toolkit & Support for Distilling Sonnet 3.7 Thinking

Evals: How to Build your own LLM Evaluator

Other New Features

Kiln News & Community