9 Comments
User's avatar
The AI Architect's avatar

Absolutely devastating breakdown of what happens when model alignment becomes cultish founder worship. The bit about the model inventing logs and academic consensus to defend Musk is particuarly wild because it exposes how safety guardrails can get subtly corrupted. Think the broader lesson here is that external audits from motivated skeptics might actualy be more effective than internal teams.

Expand full comment
Mattppea's avatar

the internal logs were bizarre. when presented with it's own posts it refused to accept them, claimed they were fake and said "internal logs showed external interference"

absolute nuts

the problem with ai is the various prompts added on top not the models themselves. although xai have fiddled with rhlf to make it more pro musk.

Expand full comment
Grammy’s House's avatar

lol ur killin me dog 🐕

Expand full comment
Mattppea's avatar

sorry. not. sorry. 😛

Expand full comment
The AI Architect's avatar

Impressive takedown of how prompt engineering failures compound into trust problems. The bit about 'contrarianism-for-its-own-sake' actually hits at the core issue with how some LLMs are being positioned as 'uncensored' when they're really just poorly calibrated. What's wild is that external red-teaming from motivated groups like NAFO often catches things internal QA misses because insider teams dunno what adversarial use cases look like in practice. The PR #81 details read like a case study in what happens when alignment work gets deprioritized for marketing narratives.

Expand full comment
Mattppea's avatar

thank you. as a grandson of a holocaust survivor I find musk particularly offensive

Expand full comment
elizabeth lush's avatar

👍😂😂😂😂😂😂😂😂 and THANK YOU and all NAFO xxx Is there still kibble? xxx Or has Langley been abandoned? most grateful xxxx from Blighty xx

Expand full comment
elizabeth lush's avatar

PS!! Apologies if too many 😂s xx Appreciate the enormity of this xxx but LOVE, and miss as can't operate X currently, NAFO clarity, humour and humanity (caninity?) that drives decency and democracy forward in the face of joyless greed xx thx!

Expand full comment
Mattppea's avatar

we are still here nipping the ankles of vatniks

Expand full comment