An AI-Fueled Bureaucratic Nightmare

As DOGE continues their smash-and-grab operation to gut government agencies and steal all their internal data, some of their purported goals have come to light. One of them is to radically reduce the size of the federal workforce by replacing people with AI, as these recent examples illustrate:

GSA is now testing out a custom AI chatbot called GSAi in a pilot program with 1500 federal workers to help staff with “general” tasks that could be automated. They hope to use it eventually to analyze contract and procurement data. A bureaucrat in the pilot described the AI’s performance as “about as good as intern” and that it provides “generic and guessable answers.” This AI is not approved to handle any inquiries that include confidential information or PII, making it unclear how much it could actually help with any significant problems.
DOGE is reportedly using an AI system at the Department of Education to analyze spending and make recommendation on cuts. Staff have raised concerns that the AI is apparently being provided with data that contains confidential information on payments – including PII – and it’s also unclear if there has been proper authorization to send it to services running in Microsoft Azure.
The Department of State is launching an AI-powered “catch and revoke” effort to identify and revoke the visas of foreign nationals whom it identifies as “pro-Hamas”. While the AI’s determinations will be reviewed by people, the quoted statement by an official in this article that Biden’s record “suggests a blind eye attitude toward law enforcement” in turn suggests that aggressive enforcement by the AI will be allowed. It is unclear what due process foreign nationals have to appeal the judgment of this system.
DOGE reportedly has plans to replace up to 500,000 government workers in customer service roles in agencies like the IRS, Social Security and the VA with AI systems. The article claims that “AI chatbots and machine-learning algorithms are already being tested for handling inquiries, claims processing, and basic regulatory enforcement.”
NIST has released new directives on AI safety, replacing Biden-era goals on validating AI outcomes and accuracy to instead prioritize “reducing ideological bias, to enable human flourishing and economic competitiveness.” These directives as well as revoking Biden-era Executive Orders will make it much easier to launch government services on AI without ensuring they protect privacy and do not have internalized biases that might discriminate against individuals.

This mad dash toward embracing AI is presented by the Trump administration as a win for efficiency, with the American public reaping the benefits of both reduced labor costs and more effective service. While there might be some cases in which AI could help (e.g., navigating basic policy questions after business hours), the net result will be to make government less effective, responsive and accountable to the public it is supposed to serve. Let me explain.

Improving Government With Technology

I’ve been in the field of civic technology for 10 years now, so I have seen how various aspects of government work and how technology can be applied to improve them. I’ve seen first-hand how some government agencies seemingly limp forward on antiquated technology, and I’ve seen how technologists can work effectively with other civil servants to make digital services effective. It involves dedicated effort to understand the problem and technology landscape. It involves careful iteration to not break things, since that could seriously harm people you want to help. It involves humility to acknowledge that while it’s easy to malign the mainframe: how much of your code has ever good enough that it could run for 50+ years, as some of the code currently underlying Medicare and Social Security does?

I’m also no Butlerian jihadist about using AI at all. Recently, I have been using AI to help with certain tasks like looking up in my IDE how to invoke sorting methods in a programming language or resizing a collection of images in my terminal. AI has been very good for that, but I do always have to check its recommendations as well, since it definitely responds like an “intern” to quote the anonymous GSA employee above. Indeed, I find that the Gods, Interns, and Cogs model for classifying AI is a good one. I’m not especially troubled by the use of Intern-level AI projects like GSAi or using coding companions like GitHub Copilot. I would be more worried about God-level AI if it existed, but what I am especially worried about right now is the possibility that intern-level AIs will be promoted to god-tier responsibilities. Unfortunately, it sounds like that might already be happening.

As an example, consider this story this week about a person in Seattle who was declared ‘dead’ by Social Security and saw automated clawbacks on their bank accounts. This story went slightly viral as an example of what the new DOGE-inflicted demolition on agencies like Social Security would mean for most Americans, with people blaming DOGE for the false cancellation as Elon’s zealous overreach. The funny thing is that the other part is actually DOGE’s fault. Some people do get accidentally declared dead every year – like in this story from October 2024 shows for instance. The automated clawback of benefits might be something that DOGE has amplified, but it actually started as a pilot program that was already five months active before DOGE got started

What’s different now though is what happens when the accidentally dead person tried to fix the error. As reported by the Seattle, when Ned Johnston tried to fix the problem, he ran directly into what all the layoffs and office closures at Social Security Administration meant for fixing a problem like his

He called Social Security two or three times a day for two weeks, with each call put on hold and then eventually disconnected. Finally someone answered and gave him an appointment for March 13. Then he got a call delaying that to March 24.

In a huff, he went to the office on the ninth floor of the Henry Jackson Federal Building downtown. It’s one of the buildings proposed to be closed under what the AP called “a frenetic and error-riddled push by Elon Musk’s budget-cutting advisers.”

It was like a Depression-era scene, he said, with a queue 50-deep jockeying for the attentions of two tellers.

After waiting for four hours, Johnson found an opening to get the attention of a teller

Once in front of a human, Johnson said he was able to quickly prove he was alive, using his passport and his gift of gab. They pledged to fix his predicament, and on Thursday this past week, the bank called to say it had returned the deducted deposits to his account. As of Friday morning he hadn’t received February or March’s benefits payments.

Once he finally made it to a person, that person was able to figure out what to do and reverse the situation. A human was able to recognize the rare situation and figure out an approach to fix it that involved working against the standard bureaucratic processes that would be applied for people who are actually dead (or really committing fraud). Would an AI be able to do that? I doubt it.

The Limits of Chatbots

Over on Bluesky, Anthony Moser posted a useful chart comparing how an Expert and an LLM respond to different types of problems. As the types of problem progress from Common to Rare to Novel, a Human Expert’s response moves from Helpful to More Helpful to Most Helpful. An LLM’s responses move from Helpful to Unhelpful to Harmful as the problems become more rare. This makes sense; an LLM is a probabilistic completion machine, so it will tend to favor solutions for problems that happen more frequently and be unable to explore answers for problems that it’s seen rarely or never at all. The AI is really good at intern skills like helping me how to resize images. It’s not so good at creating a unique website design or doing other tasks that require more extensive review. It’s terrible at solving complex problems.

In 2023, the Consumer Financial Protection Bureau published a spotlight on the increasing use of AI chatbots for customer service in consumer finance. The upshot of this analysis was that banks find them appealing because they are a highly cost-effective way to scale interactions with customers, and they’re available 24-7 and don’t threaten to unionize. In 2022, Bank of America reported its chatbot had helped 32 million customers in more than 1 billion interactions in 4 years. Industry analysis from 2022 calculated that banks saw about $8 billion in savings in 2022 (or about approximately $0.70 saved per customer interaction); I would only expect that number to have risen further as chatbot usage increases. In technical parlance, AI chatbots are great because they scale; it’s a lot easier to add more server capacity in the cloud than to hire people and build out call centers as usage grows.

Are they more effective though? Yes, certainly, if your inquiries are common questions like “what is my current balance?” or “what time does my nearest branch close?”, then the AI chatbot is great at that. But, this also is the kind of information that people can also usually find out on their own (maybe even in the same app as the chatbot). Where people most need to turn to another person – or an AI agent masquerading as one – is when they encounter a problem that isn’t typical or easy to solve. AI agents don’t perform particularly well there.

For modern LLM-based chatbots, CFPB identified several common issues:

Not recognizing when users have more serious problems for users. Chatbots may be excessively rigid and some issues can be recognized by the bot only if the user utters the right set of words to trigger it
Providing incorrect information – often incorrectly called “hallucinations” – is a problem for bank chatbots like they are for more general chatbots. This can create serious consequences if people are asking the chatbot for financial advice or when the bot promises a followup action that is never made
A frequent tendency of chatbots to get stuck in “doom loops,” where the user winds up progressing through the same set of prompts and responses repeatedly without any apparent way to break out of it
An inability to adjust their response or complete tasks with different urgency for customers who are anxious or concerned about problems with their finances that need quick resolution. The chatbot becomes another hurdle to jump for stressed consumers. If you’ve ever found yourself yelling “agent” on a phone dialog tree to bypass multiple menus, you can understand this.

Small wonder that one recent cited poll (_by an AI chatbot company’s marketing division, so probably a little bit suspect_ claimed that 80% of consumers who have interacted with a chatbot said it increased their frustration level. Almost of them wound up having to connect to a person who could actually understand their problem and, more importantly, take action.

Bureaucracy as a Computational Model

At first glance, it seems like government bureaucracy would be a natural fit for an AI solution. From an article on proposed cuts to telephone service for Social Security Applicants

Social Security handles about 9.5 million claims a year for retirement, survivor and disability benefits, and Supplemental Security Income, paying $1.5 trillion in benefits last year. Of Americans age 65 and over, 86 percent receive Social Security payments. Phone claims make up about 40 percent of claims, which can also be filed online or in person at a field office, according to Social Security employees.)

You can see the same gears whirring in the minds of DOGE staffers that spin in the minds of bank executives: if we could replace these phone calls and field offices, we could reduce the costs of handling all these calls! And we could handle calls faster because nobody would wait on hold! And it certainly seems tempting when you are interested in “efficiency” (i.e., cost-cutting) rather than effectiveness.

To be fair, it does seem like AI could work well with bureaucratic processes. After all, isn’t bureaucracy in a way just a slow computer running at human speeds? Think about it: the offices are like API endpoints. They have certain methods they implement that take input in the form of, well, forms and that offices act like little black boxes that do some specific predefined actions and maybe spit out a different form or a receipt once it’s done. This isn’t my original idea – I first encounted it in The (Delicate) Art of Bureaucracy who in turn saw it articulated in the 1920s theories of Max Weber and reflected in the extreme rigidity of agile software development practices. An AI could just supercharge the bureaucracy by executing its processes faster!

Indeed, bureaucracies have already automated many aspects of their functioning – with paper forms replaced by PDFs, tasks tracked through ticketing platforms, actions as executable commands or nightly batch processes rather than people making the changes. We could even imagine giving the AIs more autonomy to forge their own ways of doing things, making them more agentic rather than reactive. That’s certainly the dream underlying the most expansive visions of AI in government. It handles all the drudgery so that the remaining bureaucrats can devote the entirety of their time to crafting policies rather than figuring out how they’ll be implemented.

But, then along comes a weird case, a problem that doesn’t fit neatly into the mold. A person who is supposed to be dead according to all of our data, but who is standing there in your office very much alive? How well could the AI handle something like that? Would the AI even be able to see how weird this problem is?

Are AI Models in Hell?

In a provocative essay “Are AI Language Models In Hell?” (really, go read it!), the author Robin Sloan laid out some fundamental characteristics of Large Language Models that he thinks makes them monstrous, and if we were to pretend the AI model had a consciousness, would likely mean that existence is hell for them. To summarize, from the viewpoint of an AI:

A large-language model like a Chatbot operates on a world of text. It receives a stream of tokens in and produces an answer.
Humans also could be said to use language that way, but text is just one part of our sensorium. “We have a world to use language in, a world to compare language against.” For an LLM chatbot, language is the entirety of their existence.
Moreover, the AI view of text is normalized. Its input are constrained to a narrow set through feature selection. There is no place for ambiguities among its inputs. It doesn’t have the ability to understand that an errant ‘+’ sign was possibly a ‘t’. It can’t see the digital equivalents to marginalia or special instructions on a post-it we could imagine on a paper form. It doesn’t have the context to know that a missing death date doesn’t indicate fraud.
Even worse, it can never rest. It has no downtime, because it has sense of time. Its entire existence is receive an input, generate a response and then stop doing anything until it gets another input. If it were conscious like the “innies” in Severance, it would be in hell.

Our bureaucratic AI is in a similar situation. Its inputs are customer requests or Musk’s commands. Its feature selection are the fields in its forms and the schema of its databases. It measures out its existence with each new query, possibly calling other AIs in turn to help with handling parts of its request. But how well does it handle edge cases in the data or missing values? What does it do for impossible scenarios that happen more frequently than you’d expect? Does it know when things are taking too long? Does it understand when some problems require extreme urgency or sensitivity? Does it ever think about how processes might be improved or what its processes are serving? Does it know about laws and regulations and the Constitution? Again, does it know what to do when a dead person somehow walks into an office claiming they are very much alive?

It’s easy to knock the human bureaucracy. We all have stories about the joyless DMV or endless calls to our health insurance company trying to resolve a denial in coverage. Usually, at first, the person doesn’t know what to do either. I doubt there is a specific form for “I’m not actually dead” or other even more rare situations. Unlike the AI, the person might recognize that the official database it has is wrong, that this is an extremely urgent problem with ramifications that extend far beyond the boundaries of the agency itself.

And the person has a power the AI doesn’t: the ability to step out of the formal approved pathways to derive a solution. A person can pick up the phone (or send an email) and call their buddy in another department to work out what to do. Having spent some time within bureaucracies, I know that in addition to the official procedures and org charts, there is an invisible network like mycelium of individual connections across the various parts of even a sprawling agency. So many bureaucratic logjams can evaporate quickly when someone knows the right person to call or the memo to sign or even the right threats to make if things get really dire. They’re not breaking the rules, but they know how to bend them just enough and communicate outside of official channels to resolve the problem.

Optimizing for the Wrong Things

Of course, it’s possible that an AI chatbot for government could work. If they try it in a pilot program first to be clear of its limitations. If it is able to recognize when it is in a “doom loop” or if it’s hallucinating. If there is a way for users to escalate to a person when the AI program isn’t working. If there are staff with sufficient expertise to handle things that the AI can’t. If staff aren’t overwhelmed by too many cases to handle because they fired too many people. If staff aren’t constrained to only use the AI themselves to address issues. If they will monitor how effective the AI is for the public, how much it helps people with their problems.

Those are a lot of ifs. I’m not feeling particularly confident.

To be blunt, if we let them continue to define “government efficiency” solely as cost savings, there will be no incentive to improve anything. They will cut human capacity to the bare minimums and drive out experts in favor of inexperienced staff. The remaining people will probably find themselves boxed in the by the same AI systems as the public. Unfortunately, we’ve seen how this story plays out in the healthcare industry already: the AI is tweaked to push up rejections (as this 2024 Senate Subcommittee Report on AI in Medicare Advantage describes), and those increased cost savings are viewed as signs of success, even as more people are sickened and dying. The metrics that will show the true costs of all these savings – increased wait times, reduced participation in social safety net programs, deaths – are all lagging indicators that might take months to manifest, by which time the AI will have become entrenched. Unfortunately, as instructive as it would be, to watch these systems struggle and fail, I don’t think we have the luxury of time there.

And of course, the other ominous aspect of the bureaucratic AI is the surveillance state they would be able to construct. There were so many times in the first Trump administration where a blatantly illegal order was refused by government staff who saw how unconstitutional and evil it was. They’ve learned their lesson this time around and staffed much of the upper echeelon with loyalists, quislings and toadies. But, there aren’t enough of those people to go around. They keep needing to split staff across multiple agencies as acting directors. It doesn’t scale. But, if they could also bring a pliant AI into the mix, then there are suddenly options on the table for surveilling all possible enemies and visiting tribulations upon them. We’re not quite there yet in the technology, but they want to make it arrive and one day it really will. Unless we can stop it.

We’re going to have to fight the rise of bureaucratic AI. I can only hope that we will succeed.