I didn't expect this release to be so feature packed when we started, but things kept piling up in a good way. Lots of goodies here to explore:
New AI Providers
You can now connect and use the following providers in Kiln:
- Gemini API
- Vertex AI
- Hugging Face
- Anthropic
- Azure OpenAI
- Together.ai
This is on top of all our existing providers: Ollama, OpenAI, Groq, Fireworks, OpenRouter, AWS Bedrock, and any OpenAI compatible endpoint.

New Built-in Models
We have all the hip new models, each one tested to work with Kiln features like structured output and synthetic data generation:
- Gemma 3 (27B, 12B, 4B, 1B): Impressive new weight-available models from Google
- QwQ 32B: Reasoning you can run locally, from Qwen
- o1 & o3-mini: Not new, but now generally available without an invite
- Phi 4 (Mini + 5.6B): New Phi models from Microsoft
Together.ai Serverless Fine-tuning

We added support for fine-tuning models on Together.ai. Like Fireworks and OpenAI, they support "serverless" fine-tuning: no managing tuning GPUs or long-running inference servers. Hit "tune", then pay by token when it's inference time. It all scales to zero cost when you aren't using it which makes it great for rapid experimentation.
With Together you can easily fine-tune six models: Llama 3.2 1B/3B, Llama 3.1 8B/70B, Qwen 2.5 14B/72B.
Import CSV Files
Already have a dataset? The new CSV import feature makes it easy to load it into Kiln for fine-tuning and evals. Check out the docs.
Thanks to @leonardmq for contributing this!
Weights & Biases Integration
We now support Weights & Biases, an open tool for tracking AI experiment metrics like training-loss and val-loss, etc. Bring your own API key, or host your own instance.

Migration to LiteLLM for Inference
We moved to LiteLLM for inference, from a mix of OpenAI+Langchain. LiteLLM has been great, allowing us to quickly add new providers, and create reusable tests across all models. I'm very happy to be done with Langchain 😂.
Fun fact: Kiln has 2,557 test cases and 92% test coverage. On top of unit tests, we have online tests which test every built-in model against every feature to ensure things work smoothly.
Even with tests, swapping out the core inference engine in a project this size can cause bugs. Please report any issues you see in Github issues.
And More
There are lots of small quality-of-life improvements that make Kiln easier to use:
- Allow editing titles/descriptions throughout the UI
- Add UI for deleting items, including: evals, tasks, runs, custom prompts, projects
- Bug fixes
Try it out, and as always, please give feedback on Discord.
Thanks for all your support!
Steve - The Kiln Maintainer