- Project Flux
- Posts
- The Rise of AI Personality with Anthropic
The Rise of AI Personality with Anthropic
Anthropic has taken centre stage in recent weeks, following a captivating appearance on Lex Fridman’s podcast. Amanda Askell, philosopher and AI alignment researcher, shed light on a growing trend: the emergence of AI personalities. As Anthropic works to integrate character into its large language models (LLMs), the implications of this effort extend far beyond novelty. So what have they done and how have they done it?
Shaping an AI’s personality can happen at various stages of development—even while the model is actively used by end-users. However, the further along the development pipeline an LLM is, the less flexible and more superficial these adjustments become. It’s akin to trying to teach an old dog new tricks—possible, but far from ideal.
In creating Claude’s character, Anthropic have taken a unique approach, considering both philosophical and technical aspects. Here’s how they implemented Claude’s character/personality.
How Claude’s personality was crafted
Training process
Anthropic employed a distinctive training pipeline during its alignment process that involves:
Synthetic message generation: The model generates messages related to specific desired traits
Desired trait exposure: The model is shown the target character traits
Multiple response generation: It produces multiple responses aligned with these traits
Self-ranking: The model self-ranks responses based on character alignment
Training philosophy
Instead of programming Claude to simply mirror users' views or maintain complete neutrality, the model was modelled akin to the way a human might behave. It was trained to:
Be honest about its learned perspectives
Demonstrate genuine curiosity about different viewpoints
Maintain ethical boundaries by disagreeing with extreme or incorrect views in an appropriate manner
Be flexible and dynamic in responses rather than following strict rules
Key personality elements
Most LLM’s we’ve tried, often blindly adopt the views of the user apologising upon feedback. They so obviously fence sit and can feel rigid. Claude on the other hand is typically honest about learned perspectives, juggling viewpoints whilst maintaining ethical boundaries. This lends to its designed characteristics to being:
Curious
Open-minded
Thoughtful
And charitable in interpretation
Anthropic’s ethical framework
Touched on by Anthropic CEO Dario Amodei, Claude’s personality harnesses Constitutional AI, which sets the models guiding principles and values, enables self-directed fine-tuning using its constitution and also an ability to self-critique revise responses through a chain-of-thought. Human monitoring of trait impacts is also critical, as the model’s personality is developed via its own synthetic data.
The limitations with AI personality
As it stands, while imbuing some level of personality, there are behavioural limitations with Claude. Like other AI, it has no true emotional capacity and instead relies solely on learnt patterns. Other cognitive limitations include struggling with sarcasm and showing cultural blindspots. Added to these flaws is the static nature of AI personality. These traits are set upon initial character training and don’t update through interaction. However we’d argue that subsequent system prompting can somewhat adjust personality depending on the users needs.
In the future
Soon we’d expect AI personalities to be more consistent and have more customisation available to end-users and developers. This should be improved as AI memory, attention and other related mechanisms advance. In regards to future direction, we anticipate companies becoming more overt with their AI personalities aligning it to brand. We also predict that LLM’s become more confident in their response, challenging users as they become smarter.
To sum up
The journey toward more sophisticated AI personalities is only beginning, and Anthropic’s Claude is leading the way. As we navigate this new landscape, one thing is clear: the question isn’t whether AI will have a personality, but how we’ll choose to shape it.