From Sam Altman’s Sprint to Correct OpenAI’s Direction and Fend Off Google - WSJ, several interesting bits about their models:

Regarding 4o and more generally personalization:

The 4o model performed so well with people in large part because it was schooled with user signals like those which Altman referred to in his memo: a distillation of which responses people preferred in head-to-head comparisons that ChatGPT would show millions of times a day. The approach was internally called LUPO, shorthand for “local user preference optimization,” people involved in model training said.

Prerelease versions of 4o that were heavily trained with user signals didn’t show much appreciable improvement on internal evaluations of capabilities on things like science or reasoning.

The 4o model’s success with users led engineers to continue relying on those user signals in what is called post-training of subsequent updates, despite earlier warnings from some staffers that overusing these signals could make the model unsafe.

With personalization, ChatGPT is able to access contents and summaries of some prior conversations along with a set of facts about users, allowing the bot to reference them and even mirror a person’s tone.