This release is all about making Kiln easier to use. Whether you're a seasoned AI engineer or just starting out, Kiln is now better than ever.
New Models: Qwen 3 and Gemma 3
These two models have really been impressing us on internal tests. Near state-of-the-art performance, with sizes you can run locally.
- Qwen 3: Kiln supports their /think and /no_think modes for flexible reasoning control
- Gemma 3: Google's latest open model series
These additions expand our already comprehensive model lineup, giving you even more options to find the perfect balance of performance and efficiency for your use case.
Major Redesign for Evals
We've completely reimagined how you create and manage evals in Kiln. Our new step-by-step interface guides you through the entire process:
- Intuitive dataset building workflow
- Streamlined golden data rating
- Interactive eval algorithm comparison
- Best practices built right into the UI
This redesign makes it possible to create state-of-the-art evals without prior experience in evaluation design. Whether you're new to AI evaluation or an experienced practitioner, you'll find the new interface both powerful and accessible.

Improved Fine-tuning Experience
Fine-tuning has gotten a major upgrade to make the entire process more intuitive:
- Streamlined dataset creation workflow including synthetic data generation and importing from CSV
- Improved dataset management interface for re-using datasets
- Easier to build reasoning models

Quality of Life Improvements
We've also added numerous smaller but impactful improvements to make your Kiln experience even better:
- Model suggestions for evals and data generation
- New 'Short' prompt generator for minimal prompts
- Refined UI controls for better usability
- Simplified setup process for new users
- Various bug fixes and performance improvements
Try it out, and as always, please give feedback on Discord.
Thanks for all your support!
Steve - The Kiln Maintainer