Braintrust Weekly Update
Ankur Goyal · Founder
It’s been a busy week for us at Braintrust. Here’s some of the new features we shipped this week:
-
All experiment loading HTTP requests are 100-200ms faster
- We released a new tutorial: finetune GPT3.5 to write SQL queries (opens in a new tab)
You can easily finetune GPT3.5 to generate SQL queries using OpenAI and then evaluate how the fine tuned model compares to the base model using Braintrust. Check out the Jupyter Notebook example here (opens in a new tab) to get started.
We evaluated the Alpaca evals leaderboard in Braintrust
The Alpaca evals use Claude and GPT4 to rank how different LLMs perform on a variety of tasks. You can see the aggregated rankings and also dig into individual models and better understand their strengths and weaknesses. Check out the Alpaca Evals braintrust project (opens in a new tab) on Braintrust to dig in further—no login required.
We improved Datasets. See when they were last edited and the version number from the UI.
Easily see when a dataset was last changed from the UI by hovering over the ID. We also provide example code so you can quickly use the current dataset version in your project. Learn more on our datasets guide (opens in a new tab).
Release notes
- All experiment loading HTTP requests are 100-200ms faster
- The prompt playground now supports autocomplete
- Dataset versions are now displayed on the datasets page
- Projects in the summary page are now sorted alphabetically
- Long text fields in logged data can be expanded into scrollable blocks (opens in a new tab)
Braintrust is the enterprise-grade stack for building AI products. From evaluations, to prompt playground, to data management, we take uncertainty and tedium out of incorporating AI into your business.
Sign up now, or check out our pricing page for more details.