Mar 3, 2025

Kiln v0.12.1 Released

Powerful Evaluation Toolkit & Support for Distilling Sonnet 3.7 Thinking

Evals: How to Build your own LLM Evaluator

Our new update includes our biggest feature to date: Evals!

The short version: Kiln now includes a comprehensive evaluation toolkit. With it you can build powerful SOTA evals, check eval correlation to human preferences, synthetically generate an eval dataset, and use analysis tools to find the optimal way to run your task. It will automatically build an eval for any Kiln task, and includes templates for common eval use cases (bias, toxicity, jailbreaking, etc).

Read the docs to learn more, or check out our video walkthrough.

Other New Features

  • Support for distilling (fine-tuning) an open model from Sonnet 3.7 Thinking
  • New Built-In Models: Sonnet 3.7, Dolphin 2.9 8x22B, and Grok
  • New ARM builds for Linux
  • Improved logging (thanks to @leonardmq)

Kiln News & Community

  • Kiln Reaches 3K GitHub Stars and 10K Downloads: Thanks for the incredible support (we're not even at V1 yet)! Please continue to share the project with people who might enjoy it, and add a star on Github if you haven't already.
  • Join Our Discord Community: Please join to ask questions about Kiln. Feedback on evals and ideas on what we should add next are greatly encouraged!

Thanks for all your support!
Steve - The Kiln Maintainer

Get Kiln Updates in Your Inbox
Zero spam, unsubscribe at any time.