Anthropic announced the release of Claude 3.5 Sonnet, the latest version of its AI model, and says it beats GPT-4o in seven of nine tests.
Anthropic is OpenAI’s main competitor and was founded by former OpenAI executives who disagreed with the direction the company was going. In particular, Anthropic has emphasized a greater focus on safe AI development.
The Claude AI model has already demonstrated some impressive results, beating ChatGPT in the crowdsourced Chatbot Arena in March, as well as giving evidence it understands when it is being tested.
The company says the new Claude 3.5 sets the bar even higher.
Claude 3.5 Sonnet sets new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval). It shows marked improvement in grasping nuance, humor, and complex instructions, and is exceptional at writing high-quality content with a natural, relatable tone.
One of the benefits of the new model is increased speed, operating twice as fast as its predecessor. The model’s problem solving also takes a major leap forward.
In an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus which solved 38%. Our evaluation tests the model’s ability to fix a bug or add functionality to an open source codebase, given a natural language description of the desired improvement. When instructed and provided with the relevant tools, Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities. It handles code translations with ease, making it particularly effective for updating legacy applications and migrating codebases.
Anthropic emphasized its commitment to safety, engaging outside experts to help ensure Claude has the appropriate safety mechanisms in place.
As part of our commitment to safety and transparency, we’ve engaged with external experts to test and refine the safety mechanisms within this latest model. We recently provided Claude 3.5 Sonnet to the UK’s Artificial Intelligence Safety Institute (UK AISI) for pre-deployment safety evaluation. The UK AISI completed tests of 3.5 Sonnet and shared their results with the US AI Safety Institute (US AISI) as part of a Memorandum of Understanding, made possible by the partnership between the US and UK AISIs announced earlier this year.
Anthropic’s approach to safety stands in stark contrast to OpenAI, which recently dissolved the team that was responsible for ensuring AI could not pose an existential threat to humanity, and has lost a number of executives and researchers, with some of them citing grave concerns over the company’s approach to safety. Interestingly, one of the departing executives who was most vocal about OpenAI’s lack of appropriate safety measures recently joined Anthropic.
Anthropic is proving that leading-edge AI development can still be done in a safe and responsible manner.