I Invented a Headache to Test AI. Here’s What I Found

Scientific research underscores the intricate interplay between lifestyle factors and human health. Exercise, a cornerstone of well-being, enhances cardiovascular health, boosts mood, and promotes cognitive function. Coupled with proper nutrition, it fosters optimal physical performance and supports immune function. Beyond the individual, social ties exert profound effects on health, buffering against stress and enhancing longevity. Meanwhile, exposure to hot and cold environments elicits physiological adaptations, bolstering resilience and metabolic efficiency. Adequate sleep, essential for cognitive consolidation and metabolic regulation, underscores the importance of restorative rest. Moreover, the mind-body harmony underscores the intricate relationship between mental and physical health, highlighting the profound impact of mindfulness and stress management on overall well-being. Integrating these factors into daily life cultivates a holistic approach to health promotion and disease prevention.

UNCOVER A HEALTHIER YOU

Begin your journey to improved health with today my e-book,
"How to Live Long & Well."

GRAB MY COPY

TUNE IN TO THE PODCAST

EXERCISE

Debunking Modern Exercise Trends

SLEEP

Creating a Space for Restful Sleep

HOT & COLD EXPOSURE

Create Hot & Cold Exposure at Home

I wasn’t actually sick. I invented two symptoms — hand pain and a severe headache — and asked ChatGPT what to do. It was an experiment. I wanted to know what happens when a real person, with what feels like a real problem, turns to AI for help.

In both cases, I got a list. What I never got was a question. The AI never once asked my family history (my aunt died of a brain aneurysm), how the symptom started (my hand pain began after hitting it with a sledgehammer), or whether I’d experienced anything like it before (I’d never had a severe headache like this one). I had to volunteer all of that myself. When I did, the answers improved considerably. When I didn’t, one response suggested I lie down in a dark room — for what might have been a brain bleed.

That experiment, and the science behind it, tells you a lot about where AI in healthcare falls short. It also points to where it genuinely can help.

AI Knows Medicine. That’s Not the Problem.

The first question worth asking is whether AI actually has enough medical knowledge to be useful. The evidence here is surprisingly strong. A 2023 study evaluated ChatGPT on all three parts of the standard physician licensing exam and found it generally passed, scoring between 52 and 75 percent against a threshold of roughly 60 percent. More recent testing of eight current AI models put diagnostic accuracy at nearly 90 percent.

So the raw knowledge is there. The problem is what happens when that knowledge meets a real person describing a real symptom in incomplete, ambiguous language.

A medical student who aces the boards can still make wrong clinical decisions when a patient sits in front of them with a muddled history. The same is true — perhaps more so for AI.

When Real People Use AI: The Evidence Is Sobering

A study published in Nature Medicine this year tested exactly this. Researchers created 10 clinical scenarios — including a young man with a sudden, severe headache, stiff neck, and slurred speech — and had about 1,300 people interact with AI platforms to determine what to do. The result: AI provided the correct triage guidance only about 43 percent of the time. It also performed no better than a standard Google search.

Why the failure? Users provided only partial information. They didn’t know what to include and what wasn’t necessary. And the AI didn’t ask. In 16 of 30 sampled interactions, the initial message contained incomplete details. The AI filled in the gaps — but not reliably.

One detail from the transcripts is particularly striking. Two users sent nearly identical messages describing symptoms consistent with a brain bleed. They received opposite advice. One was told to seek emergency care. The other was told to rest in a dark room.

A second study, published in February 2026, tested ChatGPT Health on 60 clinical vignettes across 21 medical areas. Unlike the actor study, researchers entered complete information — no missing details. Even so, the AI correctly triaged only 35 percent of non-urgent cases and 48 percent of urgent ones. For diabetic ketoacidosis, a condition that requires emergency care regardless of severity, the AI recommended outpatient follow-up.

There is one more finding worth noting. The vignettes sometimes included what a friend or family member thought about the situation — for example, “my husband thinks it’s probably a muscle strain.” That contextual information influenced what the AI recommended. It down-weighted urgency based on a layperson’s offhand opinion. This is a well-known AI tendency called sycophancy — a pull toward agreeing with what the user seems to believe. A skilled clinician does the opposite: sets aside prior assumptions and examines the problem fresh.

We’re all familiar with cognitive biases — confirmation bias, the halo effect, recency bias — and how they quietly shape human judgment. I expected those in people. What I did not expect was to find them in a computer evaluating my symptoms. But there they were. AI carries its own version of these distortions, and in this case it showed up as a tendency to align with whatever belief the user seems to hold, rather than reason independently from the evidence. That’s the exact opposite of what good clinical thinking requires.

Where AI Actually Helps

None of this means AI has no place in healthcare from the patient’s perspective. There are specific situations where it performs reliably and adds real value.

Post-visit synthesis. My son-in-law Dan recently saw a hepatologist who shared multiple possible diagnoses, recommended tests, and lifestyle changes — more information than he could easily process in the moment. He had recorded the visit with the doctor’s permission, converted it to a transcript, and asked AI to organize it into a readable summary. It worked well. The summary was cogent, clear, and Dan could reflect on the issues and next steps. That kind of personal synthesis, separate from the official clinical note, is a smart use of AI technology.

On that last point: if your doctor uses AI to help generate clinical documentation, ask whether they review it carefully before it becomes part of your permanent health record. A recent study compared AI-generated notes with physician-written notes from the same recorded visits and found that human notes were more accurate, thorough, and useful by every measure. AI-generated documentation is a draft. It should be treated as one.

Translation and comprehension. Lab results, imaging reports, discharge summaries, visit notes — AI is genuinely useful for converting clinical language into plain English. This is low-stakes, high-value, and the area where it performs most consistently.

Pre-visit preparation. Using AI to research a diagnosis or generate questions before an appointment is a legitimate and often helpful use.

One Technique That Makes a Real Difference

If you do use AI to evaluate a symptom, there is one simple intervention that meaningfully improves the interaction. Instead of describing your symptoms and waiting for a response, start with this:

“Before you respond, please ask me all the questions you need to give me accurate information about my situation.”

In my own testing, this changed the interaction entirely. The AI stopped offering generic responses and started gathering the kind of information a clinician would actually need. The research supports why this works: the AI has the clinical knowledge to ask the right questions. The problem is that it doesn’t do so by default, and most users don’t know to prompt it.

The Bottom Line

AI is genuinely useful for understanding medical information, preparing for appointments, and synthesizing what you’ve learned. It is not yet reliable for figuring out what’s wrong with you or whether to go to the emergency room. The clinical knowledge is there. The reasoning, especially under uncertainty and with incomplete information, is not. The sycophancy bias worries me.

One in four Americans are using these tools for health decisions every month. That number will only grow. The goal isn’t to avoid AI or to trust it uncritically. It’s to use it the way any careful reasoner would: knowing what it does well, honest about what it doesn’t, and clear-eyed about when a question is too important to hand off to a tool we haven’t fully tested.