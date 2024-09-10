For entrepreneurs looking to use artificial intelligence within their business, few choices are more important than picking a model. That’s because figuring out which one is right for your use case can be difficult, especially if you don’t have a technical background.

To help solve this problem, we cooked up the Great AI Bake-Off. We challenged four of the most popular large language models currently available to complete four tasks that entrepreneurs would likely use AI to accomplish: summarizing a lengthy document, rewriting a letter from a CEO, helping to analyze market environments, and writing an elevator pitch based on deck materials. The only thing missing was a digital Paul Hollywood.

Typically, engineers use scientific benchmarks to describe how effective models are at certain tasks, but these benchmarks aren’t exactly self-explanatory. For example, Claude 3.5 Sonnet, the flagship model from AI startup Anthropic, scored a 59.4 percent on the GPQA, a series of questions designed to test graduate-level reasoning. But what does that mean in practice? We’ll let you know when we figure it out!

Instead of grading the models by how many questions they can answer correctly, we’ve chosen a winner for each task by subjectively deciding which model completed each exercise most effectively. To keep things relatively fair, we tested the free versions of the following four models: OpenAI’s GPT-4o mini (which powers ChatGPT), Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.1-70b, and Google’s Gemini 1.5 Flash.