Jury of Peers: I Asked Five AIs to Grade Each Other

Michael Hunter
4 days ago
6 min read

If you’ve wondered which AI assistant is worth paying for, this might help.

I've been playing the field among a few AIs, and I’m overdue in stepping up to a paid version. To help me decide which one, I ran a structured experiment by asking the five leading AI assistants — ChatGPT, Claude, Google Gemini, Microsoft Copilot, and Perplexity — to evaluate one another across a common set of criteria relevant to a boutique marketing consulting business. Then I averaged the results into a single "Jury of Peers" ranking.

The criteria were Ubiquity (market reach), User Satisfaction, Quality of Output, and Objectivity — weighted at 15%, 25%, 50%, and 10% respectively. I eliminated the Price criterion after discovering each platform charges around $20/month for its first paid tier.

This exercise is likely applicable even if you work in something other than marketing, because I loaded the prompt with a mix of numbers, words, and images. These three types of information figure into not only marketing consulting but, to varying degrees, most professions.

The Unanimous Winner…

ChatGPT. Every AI in the panel placed ChatGPT first overall, with a weighted score of 1.5 out of 5. Claude held second place consistently, and Google Gemini third. There was only nominal disagreement among evaluators for the 4th and 5th slots, where MS Copilot and Perplexity traded positions depending on who was doing the judging.

…and a Pleasant Surprise (the First of Two)

It was refreshing to see how non-self-serving each AI's self-assessment was. Only ChatGPT ranked itself 1^st, which would be suspect if all four of its competitors didn’t agree. Only ChatGPT declined to critique itself in its summary – all strengths, no weaknesses – but cheerily provided those when probed.

Should I have been surprised by this? Perhaps not, given that large language models are trained on vast amounts of human-written text, including a great deal of comparative analysis, tech journalism, and user reviews. They don't have ego in the human sense; they pattern-match toward what the evidence in their training data suggests is accurate. In other words, the models aren't being noble — but rather, statistically honest. That said, the consistency of the result across five independent evaluations, with different training data and different corporate incentives, is still noteworthy.

How much of ChatGPT's #1 ranking is just first-mover advantage? More than it should be, probably. ChatGPT's high Ubiquity score is predictable as in some quarters it’s become the default category term for "AI chatbot" – much like Xerox once meant photocopier. Brand recognition inflates perceived quality, and with a 50% weight on Output Quality, ChatGPT's ecosystem breadth gives it a genuine structural edge that a newer entrant simply can't match yet. But that gap is narrowing fast.

(An intriguing sidenote: the AIs performed in nearly alphabetical order).

Quick Takes: Each AI, Ranked by the Jury

1) ChatGPT — Most Versatile, Strongest Ecosystem

ChatGPT is the consensus best all-in-one workflow tool for a marketing consultant. It leads on concise, client-ready communication, the best balance of brevity and strategic framing, and uniquely among this group, offers native hero-image generation — an advantage for consultants who need to produce visual assets without switching platforms. Its main weakness, noted by multiple evaluators and showing up in its last-place Objectivity score, is a tendency toward non-neutral sources.

2) Claude — Best Thinking / Writing Partner

Claude earned consistently strong marks for long-form quality, nuanced reasoning, and what several evaluators described as the least "AI-ish" prose in the group — meaning it sounds like a smart human, not a chatbot trying to sound like a smart human. It ranks high on Objectivity and was candid about its gaps (smaller user base than ChatGPT, limited native image generation).

3) Google Gemini — Best If You Live in Google Workspace

A clear third-place finisher, with the evaluators generally agreeing on its strengths (tight Docs/Sheets/Slides integration, surging market share) and its weaknesses (output that feels less sharp and less executive-ready for high-end consulting deliverables). One evaluator put it plainly: Gemini is a compelling choice if your practice runs on Google infrastructure. If it doesn't, the integration advantage disappears and so does most of its edge. Early criticism of Gemini’s slant and wonky results appears to have been taken to heart and addressed since its launch in 2024.

4) Perplexity — Best Research Tool, Wrong Tool for This Job

Every evaluator agreed: Perplexity is best-in-class for real-time, web-grounded research with citations — exactly what you want for market scans and competitive intelligence. It also ranked first or second on Objectivity in every scorecard, consistent with its design ethos. But for a marketing consultant who needs to move from research to narrative to client-ready deliverable, Perplexity runs out of road. It ranked poorly on Output Quality in most evaluations, and it has no native image generation. It's an outstanding complement to a primary AI platform, not a replacement for one.

5) Microsoft Copilot — Powerful Feature, Incomplete Platform

Copilot's evaluations were notably consistent: strong on direct Excel and PowerPoint integration, weaker on creative writing, narrative polish, and standalone reasoning. Multiple evaluators used variations of the same phrase — it feels more like a feature than a core AI workspace. Its distribution advantage via Windows enterprise agreements is enormous on paper, but that doesn't translate into a better daily experience for an independent consultant. Copilot's entry-level paid tier requires an active MS365 subscription – which you may have already paid as a sunk cost – and if you’re in a corporate environment, you likely already use Copilot, because of company policy and the company dime.

6) Grok

Number six? Wait, didn’t I tell you this was a five-horse race? Well, I brought in Grok (xAI's model) as a fifth-wheel observer (or sixth-wheel, as it were), to see whether a non-incumbent — one with no dog in the fight among the five platforms being ranked — would score things differently. It didn't. Its rankings were largely consistent with the others, which strengthens the conclusions.

My Decision: The Judge Overrides the Jury

So, paying for ChatGPT Plus is a no-brainer, right? Welllll, not so fast. Every consumer’s buying decision is a mix of the rational and emotional, the concrete and the elusive.

In gaining greater familiarity with each of the 5 tools – a secondary objective of the exercise – I discovered that my tiebreaker is one I didn’t think to ask the jury to evaluate: aesthetics. This was the second pleasant surprise. With no prior experience on the platform, I became smitten with Claude's UI/UX (user interface / user experience), and Claude Pro will be my paid AI (at least to start), and Perplexity‘s free version will remain my go-to for quick research, both on desktop and on my iPhone. I’ll keep ChatGPT’s free version in the background, too.

Of the summary tables each AI generated in this exercise, Claude was the only one that produced output I'd describe as client-ready, albeit after incorporating my tweaks. It produced the table pictured at the top. Perplexity was my second-place for aesthetics, Copilot third, Gemini fourth, and ChatGPT — the overall winner — dead last on visual presentation.

Why might this be? Typography is doing most of the work. Claude and Perplexity both default to interfaces with more original typefaces. The result feels editorial — closer to a well-designed report than a chatbot transcript. ChatGPT and Gemini, by contrast, have stark white backgrounds with utilitarian system fonts, which may be optimized for speed and accessibility but sacrifice warmth and polish. ChatGPT is feature-rich, which is great for power users but can make outputs feel embedded in a dashboard rather than presented as finished work.

My strengths revolve around words and numbers – which is handy for someone who dives deep into marketing analytics before formulating plans to grow revenue for clients. By contrast, I can barely draw stick figures and therefore lean heavily on graphics professionals in my network. Whether despite this weakness or because of it, aesthetics matter to me. They’ve factored into my buying decisions for cars, sports equipment, and guitars… so why not my AI?

Michael Hunter is founding partner of Parallel-49, a marketing and M&A consulting firm to privately held companies.

Footnote: AI Prompt

Goal: Create a summary table comparing the leading AI assistants and recommend which single platform I should upgrade to a paid (entry‑level) plan for my marketing consulting business.

AIs to compare (alphabetical order): ChatGPT, Claude, Google Gemini, Microsoft Copilot, Perplexity

Instructions: 1. Evaluate both the free and lowest‑priced paid tier of each AI (where a paid tier exists). 2. Use the criteria and weights below.

Build a table where: Rows = the five AIs. Columns = each criterion plus an Overall Rank. For each criterion, assign a rank from 1 (best) to 5 (worst). Break ties wherever reasonable. Compute an Overall Rank using the weights provided.

Criteria and weights;

Ubiquity – 15% Approximate daily usage or market share (free + paid).
User satisfaction – 25% User ratings, reviews, and general reputation.
Quality of output – 50% Assess for marketing‑consulting use cases: Brevity. Clarity and ease of understanding. Ability to turn an Excel‑based analysis into client‑ready slides. Ability to streamline, enhance written communication. Ability to create a strong, on‑brand hero image for a consulting website (either directly or via strong image‑generation integration). 1–2 standout strengths vs. other AIs (clear points of difference).
Objectivity – 10% Tendency to avoid biased sources.

Deliverables

A ranked comparison table with: One column per criterion, showing the rank (1–5) for each AI. A weighted score and rank for each AI.
A brief narrative (3–5 sentences) explaining: Which single paid plan you recommend I upgrade to first, Why it is the best fit for a boutique revenue‑growth/marketing consultant, Any close runner‑up I should also consider and in which cases.

Jury of Peers: I Asked Five AIs to Grade Each Other

Recent Posts

Comments