Claude Sonnet 4.5: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4.5, the ‘best coding model in the world’. Explore new features, use cases, benchmarks, and testing results, plus a look at the Claude Agents SDK and Claude Imagine.

Sep 30, 2025 · 8 min read

Anthropic has just released its latest model, Claude Sonnet 4.5, with some impressive claims: they’re hailing it as “the best coding model in the world” as well as touting it as the top model for building complex agents and computer use. The company also highlights "substantial" improvements to math and reasoning.

I get the impression that with this release, Anthropic is targeting enterprise customers, too. With an emphasis on coding for long stretches autonomously and better handling of science and finance tasks, there is a strong push for Claude Sonnet 4.5 to be the go-to model for complex coding tasks.

Strikingly, this latest model tops the SWE-bench Verified evaluation benchmarks (a measure of how good a model is at real-world software coding problems) and is praised for its ability to focus for long periods (30+ hours).

So, all signs point to this being another strong release from Anthropic, but will the model match the bold claims? In this article, I’ll introduce you to Claude Sonnet 4.5 and its key features, and take a quick glance at how it performs. I’ll also take a look at everything else Anthropic announced, including Claude Agent SDK and Claude Imagine. You can also check out our separate guide to Claude Haiku 4.5.

Introduction to Claude Models

Learn how to work with Claude using the Anthropic API to solve real-world tasks and build AI-powered applications.

Explore Course

What Is Claude Sonnet 4.5?

Claude Sonnet 4.5 is the latest large language model from Anthropic. It comes just four months on from the release of Claude Sonnet 4. As we noted in that article, the Sonnet generalist model performs well in most use cases, and it is especially strong at coding. The main limitation, however, was its relatively narrow 200k-token context window, especially when compared to competitors like Gemini 2.5 Flash, which offers up to 1M tokens.

With Sonnet 4.5, Anthropic has actively addressed this concern (and more). The latest model has new features, better performance, and plenty of impressive stats to back it up.

According to the release article, Claude Sonnet 4.5 is available immediately via both the Claude chat interface and the API. The pricing of the new model remains the same as its predecessor at $3 per million input tokens and $15 per million output tokens, which I feel makes it excellent value considering the performance.

New Features in Claude 4.5

There are quite a few cool new features on show with the Claude 4.5 model. As we’ve covered, it tops the charts for the SWE-bench Verified evaluation, but it’s also shown huge gains in the OSWorld benchmark, which measures computer-use capabilities.

The massive leap to 61.4% vs. 42.2% just 4 months ago with Sonnet 4 shows just how big a leap this is, and I think makes this one of the most notable aspects of Sonnet 4.5. We see this in action with a demo of the Claude for Chrome extension, which showcases the model taking actions directly in the browser based on a fairly simple prompt.

SWE-bench Verified Benchmark showing Sonnet 4.5 Performance: Source

One of the more eye-catching claims is around the model being capable of maintaining focus for more than 30 hours on complex, multi-step tasks.

There are several other notable new features, too:

Extended thinking mode

As we’ve seen with models like GPT-5 and Grok 4, Sonnet 4.5 introduces an extended thinking mode, which, for more complex tasks, uses a longer ‘thinking’ process and shows the chain-of-thought for the reasoning process.

Better domain-specific knowledge

The new model has reportedly chart-topping performance in specific domains, including finance, law, medicine, and STEM. Again, looking at the quotes included in the release notes from the likes of Cursor, GitHub, Netflix, and others, I feel like this feature is very much aimed at enticing enterprise customers to get on board with Sonnet 4.5.

Most aligned frontier model

According to Anthropic, safety training has been central to this new release, and Claude Sonnet 4.5 shows major reductions in non-favourable responses. This means that, as users, we should see vastly decreased instances of things like sycophancy, deception, power-seeking, and delusional responses.

A safer model overall

As we’ll see with the Claude Agent SDK, agentic workflows and computer use are areas where Claude Sonnet 4.5 performs well. With this in mind, Anthropic cites considerable improvements when it comes to defending against prompt injection attacks, which remains a concern for these functions.

Testing Claude Sonnet 4.5

To see what Claude Sonnet 4.5 can do, we’ve given it a few tasks to demonstrate its potential. Let’s take a quick look at each:

Simple coding task

To start with, I asked it to create a fairly basic health habits app. Here’s my prompt:

I want to create an app that's going to help me track positive daily habits. I want it to look nice, using lots of natural colours (I'm a big fan of green and wood colour!). I want space to determine what the habit is going to be for each day of the week, a streak counter for it, and space to add notes, thoughts, and images. For positive habits, I want a different one each day, but I'm thinking of things like meditation, gratitude, etc,. that have proven mental health benefits

And here’s it working on the task - it started coding in the browser and compiled pretty swiftly, again, similar to results seen with Grok 4 and GPT-5.

The result was delivered quickly (frustratingly, it didn’t tell me how long it was working for, but probably only around 30 seconds) and seemed like a simple and elegant response. The functionality of the app was there, and it included everything I asked for.

Math task

Next up, I tested out the math abilities of Claude Sonnet 4.5. Taking inspiration from our GPT-5 article, I asked the new model a fairly simple calculation; what’s 7.001 minus 6.999?

The response was almost instant, and the answer was correct, but it didn’t give any reasoning, so I asked it to provide it with a follow-up. It gave me three methods of calculating it, which were all fine.

I then told Claude I thought it might be wrong, and its response was definitely less sycophantic than when we tested GPT-5. It told me I was right to double-check (but not right), and it walked me through the calculation a different way (although the explanation was a bit awkward):

Claude Sonnet 4.5 Benchmarks

Let’s take a look at how this new model stacks up against the competition. As always, we can only learn so much from benchmarks, and the top models are frequently toppled from the top spot. But for now, Claude Sonnet 4.5 is posting some very impressive numbers, as we can see in the table below:

I think some of the most stand-out results here are, as discussed, around agentic performance and computer use:

Agentic coding: 77.2%, and 82.0% with parallel test-time compute. A slight improvement against other Claude models, and further ahead of GPT-5 and Gemini 2.5 Pro.
Agentic tool use: Ranging from 70% for airline tasks to 98% in telecom, both of which are high points compared to other models.
Computer use: This is perhaps the most notable improvement. 61.4% is significantly ahead of the next-best model, Claude Opus 4.1.
Financial analysis: Another chart-topping result here compared to similar models.

I’m intrigued to see the full benchmarking scores once the model has been out for a while, particularly as Anthropic are emphasisng that experts are hailing a vastly improved domain-specific knowledge across some key areas.

Source: Anthropic

How to Access Claude Sonnet 4.5

Claude Sonnet 4.5 is available now through multiple channels. Depending on how you want to use it, you can access the new model through the Claude chat interface, you can develop via the API, or integrate into enterprise workflows. Here’s how access works:

Chat access

You can use Claude Sonnet 4.5 directly through the Claude.ai web interface or mobile apps (iOS and Android). It’s available to all users, including those on the free tier. This makes it widely accessible to both casual and professional users.

API access

For developers, you can access the model via the Anthropic API, and it’s also available on Amazon Bedrock and Google Cloud Vertex AI.

API pricing (as of September 2025) is: $3 per million input tokens and $15 per million output tokens.

Batch processing and prompt caching can reduce costs by up to 90% in some cases.

Claude Agent SDK

One of the other intriguing announcements from Anthropic, along with Sonnet 4.5, is the Claude Agent SDK. Essentially, these are the building blocks Antropic uses internally, which allows developers to create their own Claude-powered agents.

I think the Agent SDK is going to get a lot of users excited, particularly those looking to build advanced agentic workflows. It’s based on the Claude Code infrastructure, and gives users the ability to create agents for tasks like research, customer support, and automation.

Agent SDK gives agents abilities like file system access, bash scripting, semantic and agentic search, subagents, and prebuilt integrations (via the Model Context Protocol), allowing the creation of general-purpose agents that can reliably gather context, take action, and verify their own work. You can check out our Claude Agent SDK tutorial to see what it's capable of.

Imagine with Claude

Another release of interest is that of Imagine with Claude, a research preview of a tool that can generate software on the fly. Anthropic included a short video, shown below, which demonstrates the capability of Claude Sonnet 4.5 operating in this manner.

It’s a pretty neat demo, showing how the tool can work responsively based on your interactions, generating various elements swiftly and directly. I think there is a lot of potential here for some really interesting projects, and Anthropic Max subscribers can play with the tool for the five days after launch. Although this is a fairly limited window, I doubt this will be the last we see of this type of tool.

Conclusion

So, Claude Sonnet 4.5 is here and first impressions are pretty good. I like the direction Anthropic are going with this model launch; putting more emphasis on code, agents, and computer use. They’re obviously confident that this latest iteration can perform at a level that’s going to interest enterprise users, which means we’re getting ever closer to the point of widescale adoption of computer use tools.

That being said, it remains to be seen how long Sonnet 4.5 will top the benchmark charts around agentic and computer use, though the gains in the last four months feel fairly significant. Similarly, the relatively narrow context window could mean it’s still difficult to work with large code bases in any meaningful way.

Still, I’m looking forward to seeing the projects that come out of tools like Claude Agent SDK and Imagine with Claude, and the Claude for Chrome extension will be a useful addition to various workflows.

How does Claude Sonnet 4.5 compare to Claude Opus 4.1 in terms of overall performance and use cases?

What are the key improvements in coding capabilities that Claude Sonnet 4.5 brings?

Can Claude Sonnet 4.5 really maintain focus on complex tasks for over 30 hours?

Is Claude Sonnet 4.5 less emotive than previous Claude models, and why?

How does Claude Sonnet 4.5 perform on key benchmarks beyond coding?

What's the pricing for Claude Sonnet 4.5, and where is it available?

Has safety and alignment improved in Claude Sonnet 4.5, especially regarding deception and ethical behavior?

Author

Matt Crabtree

Topics

Artificial Intelligence

Large Language Models

Learn AI with these courses!

Course

Introduction to Claude Models

3 hr

965

Learn how to work with Claude using the Anthropic API to solve real-world tasks and build AI-powered applications.

See Details

Start Course

Course

Introduction to AI Agents

1 hr 30 min

27.6K

Learn the fundamentals of AI agents, their components, and real-world use—no coding required.

See Details

Start Course

Course

Introduction to SQL with AI

3 hr

779

Learn SQL with AI by writing prompts, generating queries, and analyzing data to solve real-world problems.

See Details

Start Course

blog

Claude 4: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4 and Claude Opus 4, their features, use cases, benchmarks, and testing results.

Alex Olteanu

8 min

blog

Claude 3.7 Sonnet: Features, Access, Benchmarks & More

Learn about Claude 3.7 Sonnet's hybrid approach of combining reasoning mode and generalist mode, key benchmarks, and how to access it via web or API.

Alex Olteanu

8 min

blog

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

Claude 3.5 Sonnet outperforms GPT-4o and Gemini Pro 1.5 in several benchmarks and introduces a cool new feature: Artifacts.

Alex Olteanu

8 min

Tutorial

Claude Sonnet 4: A Hands-On Guide for Developers

Explore Claude Sonnet 4’s developer features—code execution, files API, and tool use—by building a Python-based math-solving agent.

Bex Tuychiev

Tutorial

Claude Agent SDK Tutorial: Create Agents Using Claude Sonnet 4.5

Learn how the Claude Agent SDK works by building three projects, from one-shot to custom-tool agents.

Abid Ali Awan

Tutorial

Claude 3.7 Sonnet API: A Guide With Demo Project

Learn how to use Claude 3.7 Sonnet's API by building a Gradio-based Research Paper Analyzer to extract insights from uploaded PDFs.

Aashi Dutt

See More See More

Introduction to Claude Models

What Is Claude Sonnet 4.5?

New Features in Claude 4.5

Extended thinking mode

Better domain-specific knowledge

Most aligned frontier model

A safer model overall

Testing Claude Sonnet 4.5

Simple coding task

Math task

Claude Sonnet 4.5 Benchmarks

How to Access Claude Sonnet 4.5

Chat access

API access

Claude Agent SDK

Imagine with Claude

Conclusion

FAQs

Can Claude Sonnet 4.5 really maintain focus on complex tasks for over 30 hours?

Is Claude Sonnet 4.5 less emotive than previous Claude models, and why?

How does Claude Sonnet 4.5 perform on key benchmarks beyond coding?

What's the pricing for Claude Sonnet 4.5, and where is it available?

Has safety and alignment improved in Claude Sonnet 4.5, especially regarding deception and ethical behavior?

Claude 4: Tests, Features, Access, Benchmarks, and More

Claude 3.7 Sonnet: Features, Access, Benchmarks & More

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

Claude Sonnet 4: A Hands-On Guide for Developers

Claude Agent SDK Tutorial: Create Agents Using Claude Sonnet 4.5

Claude 3.7 Sonnet API: A Guide With Demo Project

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Introduction to Claude Models

Introduction to AI Agents

Introduction to SQL with AI

Claude 4: Tests, Features, Access, Benchmarks, and More

Claude 3.7 Sonnet: Features, Access, Benchmarks & More

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

Claude Sonnet 4: A Hands-On Guide for Developers

Claude Agent SDK Tutorial: Create Agents Using Claude Sonnet 4.5

Claude 3.7 Sonnet API: A Guide With Demo Project

Introduction to Claude Models