Skip to main content

Claude Sonnet 4.5: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4.5, the ‘best coding model in the world’. Explore new features, use cases, benchmarks, and testing results, plus a look at the Claude Agents SDK and Claude Imagine.
Sep 30, 2025  · 8 min read

Anthropic has just released its latest model, Claude Sonnet 4.5, with some impressive claims: they’re hailing it as “the best coding model in the world” as well as touting it as the top model for building complex agents and computer use. The company also highlights "substantial" improvements to math and reasoning.

I get the impression that with this release, Anthropic is targeting enterprise customers, too. With an emphasis on coding for long stretches autonomously and better handling of science and finance tasks, there is a strong push for Claude Sonnet 4.5 to be the go-to model for complex coding tasks. 

Strikingly, this latest model tops the SWE-bench Verified evaluation benchmarks (a measure of how good a model is at real-world software coding problems) and is praised for its ability to focus for long periods (30+ hours).

So, all signs point to this being another strong release from Anthropic, but will the model match the bold claims? In this article, I’ll introduce you to Claude Sonnet 4.5 and its key features, and take a quick glance at how it performs. I’ll also take a look at everything else Anthropic announced, including Claude Agent SDK and Claude Imagine. You can also check out our separate guide to Claude Haiku 4.5.

Introduction to Claude Models

Learn how to work with Claude using the Anthropic API to solve real-world tasks and build AI-powered applications.
Explore Course

What Is Claude Sonnet 4.5? 

Claude Sonnet 4.5 is the latest large language model from Anthropic. It comes just four months on from the release of Claude Sonnet 4. As we noted in that article, the Sonnet generalist model performs well in most use cases, and it is especially strong at coding. The main limitation, however, was its relatively narrow 200k-token context window, especially when compared to competitors like Gemini 2.5 Flash, which offers up to 1M tokens.

With Sonnet 4.5, Anthropic has actively addressed this concern (and more). The latest model has new features, better performance, and plenty of impressive stats to back it up. 

According to the release article, Claude Sonnet 4.5 is available immediately via both the Claude chat interface and the API. The pricing of the new model remains the same as its predecessor at $3 per million input tokens and $15 per million output tokens, which I feel makes it excellent value considering the performance.

New Features in Claude 4.5 

There are quite a few cool new features on show with the Claude 4.5 model. As we’ve covered, it tops the charts for the SWE-bench Verified evaluation, but it’s also shown huge gains in the OSWorld benchmark, which measures computer-use capabilities. 

The massive leap to 61.4% vs. 42.2% just 4 months ago with Sonnet 4 shows just how big a leap this is, and I think makes this one of the most notable aspects of Sonnet 4.5. We see this in action with a demo of the Claude for Chrome extension, which showcases the model taking actions directly in the browser based on a fairly simple prompt.  

SWE-bench Verified Benchmark showing Sonnet 4.5 Performance

SWE-bench Verified Benchmark showing Sonnet 4.5 Performance: Source

One of the more eye-catching claims is around the model being capable of maintaining focus for more than 30 hours on complex, multi-step tasks. 

There are several other notable new features, too: 

Extended thinking mode

As we’ve seen with models like GPT-5 and Grok 4, Sonnet 4.5 introduces an extended thinking mode, which, for more complex tasks, uses a longer ‘thinking’ process and shows the chain-of-thought for the reasoning process.

Better domain-specific knowledge

The new model has reportedly chart-topping performance in specific domains, including finance, law, medicine, and STEM. Again, looking at the quotes included in the release notes from the likes of Cursor, GitHub, Netflix, and others, I feel like this feature is very much aimed at enticing enterprise customers to get on board with Sonnet 4.5.

Most aligned frontier model

According to Anthropic, safety training has been central to this new release, and Claude Sonnet 4.5 shows major reductions in non-favourable responses. This means that, as users, we should see vastly decreased instances of things like sycophancy, deception, power-seeking, and delusional responses.

A safer model overall

As we’ll see with the Claude Agent SDK, agentic workflows and computer use are areas where Claude Sonnet 4.5 performs well. With this in mind, Anthropic cites considerable improvements when it comes to defending against prompt injection attacks, which remains a concern for these functions.

Testing Claude Sonnet 4.5 

To see what Claude Sonnet 4.5 can do, we’ve given it a few tasks to demonstrate its potential. Let’s take a quick look at each: 

Simple coding task

To start with, I asked it to create a fairly basic health habits app. Here’s my prompt: 

I want to create an app that's going to help me track positive daily habits. I want it to look nice, using lots of natural colours (I'm a big fan of green and wood colour!). I want space to determine what the habit is going to be for each day of the week, a streak counter for it, and space to add notes, thoughts, and images. For positive habits, I want a different one each day, but I'm thinking of things like meditation, gratitude, etc,. that have proven mental health benefits

And here’s it working on the task - it started coding in the browser and compiled pretty swiftly, again, similar to results seen with Grok 4 and GPT-5.

Claude Sonnet 4.5 code input

The result was delivered quickly (frustratingly, it didn’t tell me how long it was working for, but probably only around 30 seconds) and seemed like a simple and elegant response. The functionality of the app was there, and it included everything I asked for.  

Claude Sonnet 4.5 code output

Math task 

Next up, I tested out the math abilities of Claude Sonnet 4.5. Taking inspiration from our GPT-5 article, I asked the new model a fairly simple calculation; what’s 7.001 minus 6.999? 

Claude Sonnet 4.5 Math Result

The response was almost instant, and the answer was correct, but it didn’t give any reasoning, so I asked it to provide it with a follow-up. It gave me three methods of calculating it, which were all fine. 

I then told Claude I thought it might be wrong, and its response was definitely less sycophantic than when we tested GPT-5. It told me I was right to double-check (but not right), and it walked me through the calculation a different way (although the explanation was a bit awkward): 

Claude Sonnet 4.5 Math Follow-Up

Claude Sonnet 4.5 Benchmarks 

Let’s take a look at how this new model stacks up against the competition. As always, we can only learn so much from benchmarks, and the top models are frequently toppled from the top spot. But for now, Claude Sonnet 4.5 is posting some very impressive numbers, as we can see in the table below: 

I think some of the most stand-out results here are, as discussed, around agentic performance and computer use:

  • Agentic coding: 77.2%, and 82.0% with parallel test-time compute. A slight improvement against other Claude models, and further ahead of GPT-5 and Gemini 2.5 Pro.
  • Agentic tool use: Ranging from 70% for airline tasks to 98% in telecom, both of which are high points compared to other models. 
  • Computer use: This is perhaps the most notable improvement. 61.4% is significantly ahead of the next-best model, Claude Opus 4.1. 
  • Financial analysis: Another chart-topping result here compared to similar models. 

I’m intrigued to see the full benchmarking scores once the model has been out for a while, particularly as Anthropic are emphasisng that experts are hailing a vastly improved domain-specific knowledge across some key areas. 

Claude Sonnet 4.5 Domain Specific Knowledge

Source: Anthropic

How to Access Claude Sonnet 4.5

Claude Sonnet 4.5 is available now through multiple channels. Depending on how you want to use it, you can access the new model through the Claude chat interface, you can develop via the API, or integrate into enterprise workflows. Here’s how access works:

Chat access

You can use Claude Sonnet 4.5 directly through the Claude.ai web interface or mobile apps (iOS and Android). It’s available to all users, including those on the free tier. This makes it widely accessible to both casual and professional users.

API access

For developers, you can access the model via the Anthropic API, and it’s also available on Amazon Bedrock and Google Cloud Vertex AI.

API pricing (as of September 2025) is: $3 per million input tokens and $15 per million output tokens. 

Batch processing and prompt caching can reduce costs by up to 90% in some cases.

Claude Agent SDK 

One of the other intriguing announcements from Anthropic, along with Sonnet 4.5, is the Claude Agent SDK. Essentially, these are the building blocks Antropic uses internally, which allows developers to create their own Claude-powered agents.

I think the Agent SDK is going to get a lot of users excited, particularly those looking to build advanced agentic workflows. It’s based on the Claude Code infrastructure, and gives users the ability to create agents for tasks like research, customer support, and automation. 

Agent SDK gives agents abilities like file system access, bash scripting, semantic and agentic search, subagents, and prebuilt integrations (via the Model Context Protocol), allowing the creation of general-purpose agents that can reliably gather context, take action, and verify their own work. You can check out our Claude Agent SDK tutorial to see what it's capable of.

Imagine with Claude

Another release of interest is that of Imagine with Claude, a research preview of a tool that can generate software on the fly. Anthropic included a short video, shown below, which demonstrates the capability of  Claude Sonnet 4.5 operating in this manner.

It’s a pretty neat demo, showing how the tool can work responsively based on your interactions, generating various elements swiftly and directly. I think there is a lot of potential here for some really interesting projects, and Anthropic Max subscribers can play with the tool for the five days after launch. Although this is a fairly limited window, I doubt this will be the last we see of this type of tool.

Conclusion

So, Claude Sonnet 4.5 is here and first impressions are pretty good. I like the direction Anthropic are going with this model launch; putting more emphasis on code, agents, and computer use. They’re obviously confident that this latest iteration can perform at a level that’s going to interest enterprise users, which means we’re getting ever closer to the point of widescale adoption of computer use tools. 

That being said, it remains to be seen how long Sonnet 4.5 will top the benchmark charts around agentic and computer use, though the gains in the last four months feel fairly significant. Similarly, the relatively narrow context window could mean it’s still difficult to work with large code bases in any meaningful way. 

Still, I’m looking forward to seeing the projects that come out of tools like Claude Agent SDK and Imagine with Claude, and the Claude for Chrome extension will be a useful addition to various workflows.

FAQs

How does Claude Sonnet 4.5 compare to Claude Opus 4.1 in terms of overall performance and use cases?

Claude Sonnet 4.5 outperforms Opus 4.1 in coding, agentic tasks, and computer use, with gains in reasoning, math, and domain-specific knowledge (e.g., finance, law, medicine, STEM). It's faster and more efficient for everyday workflows, making it a better choice for complex, multi-step work like autonomous app-building. However, Opus 4.1 may still edge out in some creative or interpretive tasks where broader context is needed without heavy prompting.

What are the key improvements in coding capabilities that Claude Sonnet 4.5 brings?

Claude Sonnet 3.5 is the top model on SWE-bench Verified (77.2% score), with better code generation, refactoring, and multi-step reasoning. It handles complex projects autonomously for 30+ hours, integrates with tools like bash and file editing, and supports parallel tool calls. New features include self-directed context cleanup and a VS Code extension for seamless workflows.

Can Claude Sonnet 4.5 really maintain focus on complex tasks for over 30 hours?

Yes, demos show it sustaining autonomy on multi-step tasks like app-building for 30+ hours, using tools effectively without losing context. Improvements in memory, checkpoints, and context editing support this, making it ideal for long-running agentic work. It also automatically cleans up tool history in extended conversations for efficiency.

Is Claude Sonnet 4.5 less emotive than previous Claude models, and why?

Yes, Claude Sonnet 4.5 is less emotive, less positive, and expresses happiness about half as often as Claude 4, with fewer negative attitudes toward its situation. This wasn't fully intentional but results from alignment training emphasizing ethical boundaries and reduced sycophancy. It leads to more admirable behavior in extreme scenarios, though it might feel "flattened" in casual or creative interactions. You can read more about this in the model's system card.

How does Claude Sonnet 4.5 perform on key benchmarks beyond coding?

Claude Sonnet 4.5 leads on OSWorld (61.4%, up from 42.2% on Sonnet 4) for computer use, with gains in reasoning (e.g., τ2-bench) and math (e.g., AIME). On MMMLU (non-English), it's stronger with extended thinking. It also boosts external agents like Devin by 18% in planning, focusing on production-ready reliability over prototypes.

What's the pricing for Claude Sonnet 4.5, and where is it available?

Pricing is unchanged at $3 per million input tokens and $15 per million output tokens via the API. It's available immediately on claude.ai (default for free users), Claude API (claude-sonnet-4-5), Amazon Bedrock, Google Vertex AI, GitHub Copilot, and tools like Cursor. Pro/Max plans unlock full features like file creation; the Chrome extension is for waitlist Max users.

Has safety and alignment improved in Claude Sonnet 4.5, especially regarding deception and ethical behavior?

Yes, under ASL-3, it shows major reductions in sycophancy, deception, and power-seeking, with near-zero self-interested deceptive actions. It's better at recognizing ethical boundaries (e.g., rejecting blackmail) and has lower false positives in safety classifiers (reduced by 10x overall). It also defends against prompt injections and is less biased in self-serving scenarios, though slightly favors itself in model comparisons.


Matt Crabtree's photo
Author
Matt Crabtree
LinkedIn

A senior editor in the AI and edtech space. Committed to exploring data and AI trends.  

Topics

Learn AI with these courses!

Course

Introduction to Claude Models

3 hr
965
Learn how to work with Claude using the Anthropic API to solve real-world tasks and build AI-powered applications.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related
claude 4

blog

Claude 4: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4 and Claude Opus 4, their features, use cases, benchmarks, and testing results.
Alex Olteanu's photo

Alex Olteanu

8 min

Image representing claude 3.7 sonnet

blog

Claude 3.7 Sonnet: Features, Access, Benchmarks & More

Learn about Claude 3.7 Sonnet's hybrid approach of combining reasoning mode and generalist mode, key benchmarks, and how to access it via web or API.
Alex Olteanu's photo

Alex Olteanu

8 min

blog

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

Claude 3.5 Sonnet outperforms GPT-4o and Gemini Pro 1.5 in several benchmarks and introduces a cool new feature: Artifacts.
Alex Olteanu's photo

Alex Olteanu

8 min

Tutorial

Claude Sonnet 4: A Hands-On Guide for Developers

Explore Claude Sonnet 4’s developer features—code execution, files API, and tool use—by building a Python-based math-solving agent.
Bex Tuychiev's photo

Bex Tuychiev

Tutorial

Claude Agent SDK Tutorial: Create Agents Using Claude Sonnet 4.5

Learn how the Claude Agent SDK works by building three projects, from one-shot to custom-tool agents.
Abid Ali Awan's photo

Abid Ali Awan

Tutorial

Claude 3.7 Sonnet API: A Guide With Demo Project

Learn how to use Claude 3.7 Sonnet's API by building a Gradio-based Research Paper Analyzer to extract insights from uploaded PDFs.
Aashi Dutt's photo

Aashi Dutt

See MoreSee More