Mar 19, 2025

Kiln v0.13.1 Released

Vertex, Azure, QwQ, Gemma, Hugging Face, W&B, Anthropic, Import CSV, and More!

I didn't expect this release to be so feature packed when we started, but things kept piling up in a good way. Lots of goodies here to explore:

New AI Providers

You can now connect and use the following providers in Kiln:

  • Gemini API
  • Vertex AI
  • Hugging Face
  • Anthropic
  • Azure OpenAI
  • Together.ai

This is on top of all our existing providers: Ollama, OpenAI, Groq, Fireworks, OpenRouter, AWS Bedrock, and any OpenAI compatible endpoint.

Kiln Providers
Kiln now supports 13 providers.

New Built-in Models

We have all the hip new models, each one tested to work with Kiln features like structured output and synthetic data generation:

  • Gemma 3 (27B, 12B, 4B, 1B): Impressive new weight-available models from Google
  • QwQ 32B: Reasoning you can run locally, from Qwen
  • o1 & o3-mini: Not new, but now generally available without an invite
  • Phi 4 (Mini + 5.6B): New Phi models from Microsoft

Together.ai Serverless Fine-tuning

Together.ai

We added support for fine-tuning models on Together.ai. Like Fireworks and OpenAI, they support "serverless" fine-tuning: no managing tuning GPUs or long-running inference servers. Hit "tune", then pay by token when it's inference time. It all scales to zero cost when you aren't using it which makes it great for rapid experimentation.

With Together you can easily fine-tune six models: Llama 3.2 1B/3B, Llama 3.1 8B/70B, Qwen 2.5 14B/72B.

Import CSV Files

Already have a dataset? The new CSV import feature makes it easy to load it into Kiln for fine-tuning and evals. Check out the docs.

Thanks to @leonardmq for contributing this!

Weights & Biases Integration

We now support Weights & Biases, an open tool for tracking AI experiment metrics like training-loss and val-loss, etc. Bring your own API key, or host your own instance.

Weights & Biases
Weights & Biases integration.

Migration to LiteLLM for Inference

We moved to LiteLLM for inference, from a mix of OpenAI+Langchain. LiteLLM has been great, allowing us to quickly add new providers, and create reusable tests across all models. I'm very happy to be done with Langchain 😂.

Fun fact: Kiln has 2,557 test cases and 92% test coverage. On top of unit tests, we have online tests which test every built-in model against every feature to ensure things work smoothly.

Even with tests, swapping out the core inference engine in a project this size can cause bugs. Please report any issues you see in Github issues.

And More

There are lots of small quality-of-life improvements that make Kiln easier to use:

  • Allow editing titles/descriptions throughout the UI
  • Add UI for deleting items, including: evals, tasks, runs, custom prompts, projects
  • Bug fixes

Try it out, and as always, please give feedback on Discord.

Thanks for all your support!
Steve - The Kiln Maintainer

Get Kiln Updates in Your Inbox
Zero spam, unsubscribe at any time.