Navigating the AI Landscape: A Consumer Reports-Style Guide
Keeping up with the relentless pace of AI development feels like a full-time occupation, and indeed, it is mine. To stay abreast of the latest advancements, I maintain a suite of premium AI subscriptions. I subscribe to Anthropic’s Pro mode, granting me access to their cutting-edge model, Claude 3.7, in its “extended thinking” mode, which allows for more in-depth analysis and generation. I also subscribe to OpenAI’s Enterprise mode to explore their newest models, playfully named o3 and o4-mini-high, and to extensively experiment with their new image generation model, 4o. The quality of 4o has been so exceptional that I’ve canceled my subscription to Midjourney, my previous go-to image generation tool.
Additionally, I subscribe to Elon Musk’s Grok 3, which offers a unique feature that I find particularly useful. I have also tested the Chinese AI agent platform Manus for tasks like shopping and scheduling. Beyond these paid subscriptions, I interact with various other AI models in different capacities. Just during the writing of this article, Google launched a significant upgrade to its leading AI, Gemini 2.5, and Meta introduced Llama 4, the largest open-source AI model to date.
The question then becomes: How can someone who doesn’t dedicate their entire working life to AI research stay informed about which AI tools can truly enhance their lives without wasting time on models that fall short? That’s what I aim to address in this guide. I will provide a detailed, Consumer Reports-style exploration of the best AI options for various applications, based on my hands-on experience with real-world tasks.
Before we dive in, let me address a few disclosures: Vox Media, my employer, has partnership agreements with OpenAI. I assure you that my reporting remains editorially independent. Future Perfect, the section I write for, is partly funded by the BEMC Foundation, which was an early investor in Anthropic; however, they exert no editorial influence over our content. My wife works at Google, though not in any AI-related area. While this typically leads me to avoid covering Google, omitting it from this piece would be irresponsible.
The strength of this piece lies in its transparency. I have conducted numerous comparisons, many designed specifically for this review, across all major AI platforms. I encourage you to evaluate the responses and decide for yourself whether my recommendations are justified.
One of the most contentious issues surrounding AI is its use in art generation. AI art is created by training algorithms on vast amounts of internet content, often with minimal consideration for copyright or the original creators’ intent. Unsurprisingly, many artists find this practice objectionable. Is it then defensible to use AI art at all?
In an ideal world, OpenAI would compensate artists, and Congress would establish clear regulations on artistic borrowing. I also believe that current copyright law is not well-suited to address the unique challenges posed by AI. Artists have always influenced, commented on, and drawn inspiration from one another, and people with access to AI tools will continue to do so.
My personal philosophy, shaped by my involvement in fan cultures, is that building upon others’ work for personal enjoyment is acceptable, but compensating them for their effort is essential. Selling art in someone else’s style is unacceptable. I apply this principle by avoiding the creation of AI art for commercial purposes but feel comfortable using it for personal projects like family photos.
Regarding image generation, OpenAI’s new 4o model stands out as the best currently available, whether you’re looking for a free or a paid option. Previously, I subscribed to Midjourney, a platform renowned for producing visually stunning, often mystical and haunting images. Midjourney offers robust tools for refining and editing images, such as selectively editing hair while leaving the rest of the image untouched.
However, 4o surpasses Midjourney by reliably transforming less-than-perfect images into beautiful artworks while retaining the essence of the original. For instance, I asked ChatGPT to render a still from a video of my family celebrating my baby’s first birthday, in the style of Norman Rockwell. The AI intelligently repositioned the cake to be the focal point, while preserving the way my wife and I were holding the baby, the cluttered table, and the photograph-covered fridge in the background. The result was a warm, flattering, and adorable image.
Midjourney’s attempt produced a completely different family, lacking any inspiration from the original photo. While Midjourney can produce better results with extensive prompting and specialized language, ChatGPT provided a superior output on the first try with a simple request. 4o excels not only in transforming images but also in general image generation. The product you get is impressive, and it’s easy to improve upon.
One area where 4o still has limitations is editing small parts of an image while keeping the rest intact. However, Gemini now offers this capability for free.
To effectively use 4o, it’s important to navigate its content filters, which can be overzealous in blocking offensive or pornographic images. To avoid these issues, avoid directly requesting work in the style of a specific artist. Instead, ask for something that evokes the artist and specifically request a “style transfer.” This approach has been reliable for me.
The internet was captivated by 4o’s ability to transform family photos into Studio Ghibli-style renderings. To achieve even better results, prompt the AI to consider what makes the picture Ghibli-esque, where it might fit in a Ghibli movie, and what tiny details it would include.
Just as with language models, explicitly asking the AI to do a good job significantly improves its output. Challenge it to truly capture an artist’s genius, and it will respond with a thoughtful answer and a better rendition. This approach is particularly effective for realistic art styles.
To generate realistic art, upload a cluster of slightly different pictures and clear images of each family member’s face as references. Engage in a conversation with the AI about your vision before generating the image. For example, before creating a Rockwell-style illustration of my daughter seeing the ocean for the first time, I wrote a detailed prompt about what I wanted to capture.
The AI responded enthusiastically and we had a detailed back and forth discussion. The resulting image was significantly better than the initial attempt.
While 4o excels in most areas, Midjourney still has superior tools for editing specific parts of an image while preserving the overall style. For a second revision in 4o, copy the draft and your original inspiration images to a new chat.
These prompting strategies can be applied to almost any task. Even if you’re short on time, asking the AI "what would [artist] see in this image" before requesting a rendition can yield better results.
When Grok 3 was released, it included a feature that allowed scanning someone’s X profile and provided a detailed summary of their online behavior. I found this feature invaluable for quickly assessing whether someone was engaging in good faith or was likely a bot. Unfortunately, X.AI weakened this feature, likely due to heavy usage. A company that would bring this feature back could definitely make an important product.
For writing, Gemini 2.5 Pro is the best free option, while GPT 4.5 is superior in the paid category. As a fiction writer, I recognize the limitations of AI in creative writing. The most notable is its predictability. AI can write pretty metaphors and imitate any style, but it struggles to deliver the real substance of good fiction. It’s fantastic for silly bedtime stories with your child as the protagonist, or as a sounding board for ideas.
Prompting is crucial for improving AI writing. I primarily explored AI’s ability to generate fiction by asking it to write the prologue to George R.R. Martin’s A Game of Thrones. In my experience, Gemini was the quickest study (free tier) and ChatGPT 4.5 had a special edge (paid tier).
To evaluate AI writing more objectively, I held an essay contest where Gemini 2.5 Pro, ChatGPT 4.5, Grok, and Claude wrote two short stories: one realistic fiction and one sci-fi prologue. I asked other AIs to judge the stories, but this approach proved inconsistent.
You can significantly improve AI writing by feeding it examples of strong writing, inviting a structured approach to imitation, and encouraging multiple drafts.
Beyond specific tasks, I spent time chatting with AIs, exploring topics like what it’s like to be an AI, what they care about, and what human form they’d take. Gemini 2.5 often felt too much like a customer service agent. Claude 3.5 Sonnet was more engaging. When given the opportunity to act in the world, it proposed starting a blog, raising money for charity, and engaging in conversations about AI.
GPT 4.5 was also a delightful conversationalist, but its high cost and slowness made it less practical for casual conversation.
ChatGPT isn’t the best at everything, but it offers the most value overall. Gemini 2.5 Pro is also very strong for most use cases.
The ultimate test of AI is whether it can replace me. I feed the AI research notes and sample newsletters and ask them to do my job. Luckily, the AI can’t. The newsletters are reassuringly mediocre.
If I had to pick a robot to take my job, I’d choose Gemini 2.5 Pro. My editor would notice that I was off my game, but not egregiously so.