Project Flux
Posts
The Rise of AI Personality with Anthropic

The Rise of AI Personality with Anthropic

Anthropic has taken centre stage in recent weeks, following a captivating appearance on Lex Fridman’s podcast. Amanda Askell, philosopher and AI alignment researcher, shed light on a growing trend: the emergence of AI personalities. As Anthropic works to integrate character into its large language models (LLMs), the implications of this effort extend far beyond novelty. So what have they done and how have they done it?

December 02, 2024

Shaping an AI’s personality can happen at various stages of development—even while the model is actively used by end-users. However, the further along the development pipeline an LLM is, the less flexible and more superficial these adjustments become. It’s akin to trying to teach an old dog new tricks—possible, but far from ideal.

In creating Claude’s character, Anthropic have taken a unique approach, considering both philosophical and technical aspects. Here’s how they implemented Claude’s character/personality.

How Claude’s personality was crafted

Training process

Anthropic employed a distinctive training pipeline during its alignment process that involves:

Synthetic message generation: The model generates messages related to specific desired traits
Desired trait exposure: The model is shown the target character traits
Multiple response generation: It produces multiple responses aligned with these traits
Self-ranking: The model self-ranks responses based on character alignment

Training philosophy

Instead of programming Claude to simply mirror users' views or maintain complete neutrality, the model was modelled akin to the way a human might behave. It was trained to:

Be honest about its learned perspectives
Demonstrate genuine curiosity about different viewpoints
Maintain ethical boundaries by disagreeing with extreme or incorrect views in an appropriate manner
Be flexible and dynamic in responses rather than following strict rules

Key personality elements

Most LLM’s we’ve tried, often blindly adopt the views of the user apologising upon feedback. They so obviously fence sit and can feel rigid. Claude on the other hand is typically honest about learned perspectives, juggling viewpoints whilst maintaining ethical boundaries. This lends to its designed characteristics to being:

Curious
Open-minded
Thoughtful
And charitable in interpretation

Anthropic’s ethical framework

Touched on by Anthropic CEO Dario Amodei, Claude’s personality harnesses Constitutional AI, which sets the models guiding principles and values, enables self-directed fine-tuning using its constitution and also an ability to self-critique revise responses through a chain-of-thought. Human monitoring of trait impacts is also critical, as the model’s personality is developed via its own synthetic data.

The limitations with AI personality

As it stands, while imbuing some level of personality, there are behavioural limitations with Claude. Like other AI, it has no true emotional capacity and instead relies solely on learnt patterns. Other cognitive limitations include struggling with sarcasm and showing cultural blindspots. Added to these flaws is the static nature of AI personality. These traits are set upon initial character training and don’t update through interaction. However we’d argue that subsequent system prompting can somewhat adjust personality depending on the users needs.

In the future

Soon we’d expect AI personalities to be more consistent and have more customisation available to end-users and developers. This should be improved as AI memory, attention and other related mechanisms advance. In regards to future direction, we anticipate companies becoming more overt with their AI personalities aligning it to brand. We also predict that LLM’s become more confident in their response, challenging users as they become smarter.

To sum up

The journey toward more sophisticated AI personalities is only beginning, and Anthropic’s Claude is leading the way. As we navigate this new landscape, one thing is clear: the question isn’t whether AI will have a personality, but how we’ll choose to shape it.

The rabbit hole

Claude’s character by it’s creators Anthropic 🔗

Anthropic’s excellent podcast with Lex Fridman 🔗

Interesting research on Anthropic’s alignment 🔗

Claude’s extreme but beloved golden gate bridge personality 🔗