The Rise of AI Personality with Anthropic

Anthropic has taken centre stage in recent weeks, following a captivating appearance on Lex Fridman’s podcast. Amanda Askell, philosopher and AI alignment researcher, shed light on a growing trend: the emergence of AI personalities. As Anthropic works to integrate character into its large language models (LLMs), the implications of this effort extend far beyond novelty. So what have they done and how have they done it?

Shaping an AI’s personality can happen at various stages of development—even while the model is actively used by end-users. However, the further along the development pipeline an LLM is, the less flexible and more superficial these adjustments become. It’s akin to trying to teach an old dog new tricks—possible, but far from ideal.

In creating Claude’s character, Anthropic have taken a unique approach, considering both philosophical and technical aspects. Here’s how they implemented Claude’s character/personality.

How Claude’s personality was crafted

  1. Training process

Anthropic employed a distinctive training pipeline during its alignment process that involves:

  • Synthetic message generation: The model generates messages related to specific desired traits

  • Desired trait exposure: The model is shown the target character traits

  • Multiple response generation: It produces multiple responses aligned with these traits

  • Self-ranking: The model self-ranks responses based on character alignment

  1. Training philosophy

Instead of programming Claude to simply mirror users' views or maintain complete neutrality, the model was modelled akin to the way a human might behave. It was trained to:

  • Be honest about its learned perspectives

  • Demonstrate genuine curiosity about different viewpoints

  • Maintain ethical boundaries by disagreeing with extreme or incorrect views in an appropriate manner

  • Be flexible and dynamic in responses rather than following strict rules

  1. Key personality elements

Most LLM’s we’ve tried, often blindly adopt the views of the user apologising upon feedback. They so obviously fence sit and can feel rigid. Claude on the other hand is typically honest about learned perspectives, juggling viewpoints whilst maintaining ethical boundaries. This lends to its designed characteristics to being:

  • Curious

  • Open-minded

  • Thoughtful

  • And charitable in interpretation

  1. Anthropic’s ethical framework

Touched on by Anthropic CEO Dario Amodei, Claude’s personality harnesses Constitutional AI, which sets the models guiding principles and values, enables self-directed fine-tuning using its constitution and also an ability to self-critique revise responses through a chain-of-thought. Human monitoring of trait impacts is also critical, as the model’s personality is developed via its own synthetic data.

The limitations with AI personality

As it stands, while imbuing some level of personality, there are behavioural limitations with Claude. Like other AI, it has no true emotional capacity and instead relies solely on learnt patterns. Other cognitive limitations include struggling with sarcasm and showing cultural blindspots. Added to these flaws is the static nature of AI personality. These traits are set upon initial character training and don’t update through interaction. However we’d argue that subsequent system prompting can somewhat adjust personality depending on the users needs.

In the future

Soon we’d expect AI personalities to be more consistent and have more customisation available to end-users and developers. This should be improved as AI memory, attention and other related mechanisms advance. In regards to future direction, we anticipate companies becoming more overt with their AI personalities aligning it to brand. We also predict that LLM’s become more confident in their response, challenging users as they become smarter.

To sum up

The journey toward more sophisticated AI personalities is only beginning, and Anthropic’s Claude is leading the way. As we navigate this new landscape, one thing is clear: the question isn’t whether AI will have a personality, but how we’ll choose to shape it.

The rabbit hole