AI Doesn’t Have to Be a God in a Server Rack

I’ve been annoyed by the state AI is in for a while now. Many things seem wrong to me, and I’ll try to sort my thoughts here before I explain what I’m doing in the space right now. This posting here will start a series of posts about a topic I’m calling “Ninereeds / BDH Cognitive OS”.

The big labs, Anthropic, OpenAI, Google, etc are chasing the “god model in the cloud”: Their goal is to reach AGI/ASI and serve it to users from their data centres. I have no problem with that; quite the opposite: I’m wishing them luck and hope they succeed. If ASI came to pass, we’d have ways to deal with medical problems, climate change, automation, longevity, and many more. New chip designs, material science, and our understanding of basically every single field would advance dramatically.

Even if no dramatic innovations happened anymore, implementing what we have right now (Claude Opus, ChatGPT, Gemini) properly would enable “silo hopping”. Humans work in their fields, and knowledge and understandings usually stay in these fields. Rarely does something connect to something in a different scientific area, and if it does, it usually has enormous impact:

Darwinian evolution eventually inspired genetic algorithms, evolutionary optimization, and modern reinforcement learning ideas. Biology accidentally became a research field in search.

Entropy started as a physics concept describing disorder and heat flow. Then Claude Shannon realized information itself could be treated mathematically in similar ways. That connection essentially created digital communication and modern computing theory.

Concepts like cooperation, signaling, prisoner’s dilemma dynamics, and emergent behavior crossed between biology, economics, political science, and now AI alignment discussions.

We humans are structurally bad at silo hopping because we simply cannot hold enough simultaneous context. We specialised, exactly because no one human can keep all of knowledge in mind, and this strength also has a downside. Expertise narrows attention; academic incentives reward specialization; papers are unreadable outside their field. AI can help with this, find connections where humans miss them, and drive innovation. This alone has the potential to help us solve problems that seemed unsolvable before and opens up opportunities I don’t want to speculate about (and I am an SF writer).

While that’s all interesting and promising, it also has dark sides. Like fire, then electricity, then the internet, AI will eventually become part of our daily lives and at some point be irreplaceable. You’ll need AI just as urgently as you need your smartphone today. And if all AI is controlled by a few megacorps, we’ll live in a cyberpunk dystopia – not really what I had in my cards, and not the future I’d want to live in. Humanity may benefit enormously from superintelligence, while individual humans simultaneously lose autonomy. Those are not contradictions. The same system that speeds up medicine and science can also centralize cognition itself, and it can enable surveillance and automated warfare that could bug out and mistake you for a military target.

I’ve been “doing” open source for a while now. Privacy and independence drove me to create Writingway 1+2, because having my drafts sit on some web server run by a SaaS corporation (Sudowrite, NovelCrafter) that could be hacked, or sell my data under the table, while running expensive cloud models to do the AI-assisted writing just felt wrong to me. And, of course, I abhor being squeezed financially for my creative hobby. I’m not a millionaire like Brandon Sanderson or Stephen King (and if I were, I might still prefer writing even my vomit drafts 100% by hand).

I developed an agent ecosystem, in which agents, run by whatever model (including locally run LLMs) would work together to create a novel end-to-end, from a one-line premise to a finished draft, or from anywhere, with the degree of autonomy set by the user. The problem is, only SOTA (state-of-the-art) models can (just barely) do over 4 tool calls with JSON output in one task, and long horizon planning is still wonky, even for the most capable transformers we have in May 2026 – and the very point was privacy and independence, anyway. If it can’t run with local models, I don’t want to use it. My dream is to have the system locally and offline. Carry it on my laptop and run it on an airplane. And most of the hardships come from a handful of factors, all of which stem from the architecture of current LLMs:

AI models are huge. Size determines intelligence, still, which is needed for instruction following. Even the best 30B(illion parameters) models are still dramatically limited compared to even a 70B, and especially when compared to state-of-the-art cloud models.

Transformers have a context window as short-term memory, and it scales terribly. If a 30B model, quantised (a way to “trim the fat” from a model) to 4 bits, eats some 18 GB of VRAM, then that’s not including the context window yet. And the way context windows work (KV cache), it has brutal memory requirements, and running on ordinary RAM is terribly slow. And we’re talking minutes of wait time here, just to get a reply to “Hello”.

Every day is Groundhog Day for a transformer. In fact, even within a chat on Claude’s or GPT’s website, you send the prompt, and the model is “born”. A blank slate, with memory frozen in whatever state it was in at the cut-off date. Anything that happened after, which includes the three – hour-long chat about your cat and your neighbours in the current session, is not known to it. The chats just inject the chat history, so the AI learns what you’re talking about, which is a neat trick and works, but it’s like your brain reading its working memory contents from a piece of paper every time you blink. And as explained above, the context window is expensive. So expensive, in fact, that even an average gamer rig like mine (Ryzen 7 CPU, 64GB RAM, and a 12GB VRAM RTX3060) can only run small, and therefore not very sophisticated models. And if you have your PC only for writing? Fuhgeddaboudit. OOM for an AI doesn’t mean Out Of Mana, but has the same effect: your fireball will fizzle.

I don’t think the future of AI is a god in a server rack humans pray to through subscriptions and APIs. I think the more interesting future is symbiosis: personal cognitive systems that grow alongside individual humans, adapt to them, protect their privacy, and become extensions of their thinking rather than replacements for it. You want your own AI system that has skin in the game with your personal well-being. Your personal partner that shares nothing with anyone but you. That helps you navigate an increasingly complex and dangerous world. That helps our outdated bio-brain’s firmware by protecting us from tigers jumping at us from behind a bush, in the shape of malware in a document, and that can filter out all the fake news, deep fakes, astroturf bot farm posts on social media, the outrage bait, the hate. And the model shouldn’t answer to Sam Altman or Dario Amodei.

I’d more or less accepted that this was a pipe dream until I stumbled across a paper from Pathway – a small AI lab that had apparently been asking the same questions I had and arrived at a very different answer.

So what I’m currently working on is, I’m about to train a model that has memory, and continuity, and persistence. It learns while you chat with it, gets better (or worse – garbage in, garbage out), and grows. Like a Tamagotchi, if you will. The architecture is called “Baby Dragon Hatchling” and was developed by Pathway, who released the model itself as open source. They also trained one themselves, and it’s reportedly exceptional at Sudoku, which is remarkable because Sudoku requires skills transformers don’t have, one of them being keeping a lot of state changes in memory. Of course, Pathway didn’t release the model weights. So you only have the Python file to run it, and one to train it, and how you train it or what you do with it is all up to you to discover.

And that’s what I’m doing right now.

I’ve just finished building a training corpus from scratch and will start training it soon. I’ll document the experiments here in a series of posts that will start with posts that describe what the model can and can’t do, how it learns, what I’ve tried, how I got to where I am now, what the training curriculum I designed looks like, how I intend to train it, and… the operating system it will live in. Which is a lot to write, and I hope the content will be interesting. But even if it bores you to death, maybe someone out there will find it interesting and/or useful. I do hope, though, that it’s a fun read, anyway.

And before I close, let me explain where the name “Ninereeds” comes from.

Ninereeds is a character in a Terry Pratchett novel called “The Colour of Magic”. Pathway named their model architecture “Baby Dragon” as homage to Sir Terry, who will be missed dearly. I gave my model the name of the dragon the characters’ collective belief brings into being, Ninereeds, to stay true to the lineage, and as a nod to both Pathway, who created this architecture, and the original author of the novel, who followed the guy who speaks in “small caps” in 2015.

A. Omukai

AI-Assisted Writing and Science Fiction

AI Doesn’t Have to Be a God in a Server Rack

Leave a comment Cancel reply

AI Doesn’t Have to Be a God in a Server Rack

Share this:

Related

Leave a comment Cancel reply