Anthropic’s Claude 3 Opus appears to be demonstrating an intriguing level of awareness in a possible first for AI.
According to Anthropic’s Alex Albert, Opus was able to recognize that a question seemed out of place, leading it to conclude the question could be part of an evaluation being conducted on it.
Albert explains the test in a tweet:
For background, this tests a model’s recall ability by inserting a target sentence (the “needle”) into a corpus of random documents (the “haystack”) and asking a question that could only be answered using the information in the needle.
Anthropic’s researches asked Opus to answer a question regarding the most delicious pizza toppings. Below is the AI’s response:
“The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association,” Opus replied.
“However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping ‘fact’ may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all. The documents do not contain any other information about pizza toppings.”
The response is an interesting development and shows how far AI models have come. Some Twitter users were quick to point out the potential implications of Albert’s revelation.
That fine line between ‘fun story’ and ‘existentially terrifying horrorshow’
Geoffrey Miller (@primalpoly)— March 4, 2024