Can China's MiniMax-M1 AI Topple US Rivals? We Put It to the Check - Decrypt

In short
MiniMax-M1 excels at coding and agent duties, however inventive writers will need to look elsewhere.
Regardless of advertising and marketing claims, real-world testing finds platform limits, efficiency slowdowns, and censorship oddities.
Benchmark scores and have set put MiniMax-M1 in direct competitors with paid U.S. fashions—at zero value.
A brand new AI mannequin out of China is producing sparks—for what it does effectively, what it doesn’t, and what it would imply for the steadiness of world AI energy.MiniMax-M1, launched by the Chinese language startup of the identical identify, positions itself as probably the most succesful open-source “reasoning mannequin” so far. In a position to deal with one million tokens of context, it boasts numbers on par with Google’s closed-source Gemini 2.5 Professional—but it’s accessible at no cost. On paper, that makes it a possible rival to OpenAI’s ChatGPT, Anthropic’s Claude, and different U.S. AI leaders.Oh yeah—it additionally beats fellow Chinese language startup DeepSeek R1's capabilities in some respects.
Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our newest LLM — setting new requirements in long-context reasoning.
– World’s longest context window: 1M-token enter, 80k-token output- State-of-the-art agentic use amongst open-source models- RL at unmatched effectivity:… pic.twitter.com/bGfDlZA54n
— MiniMax (official) (@MiniMax__AI) June 16, 2025Why this mannequin mattersMiniMax-M1 represents one thing genuinely new: a high-performing, open-source reasoning mannequin that’s not tied to Silicon Valley. That’s a shift value watching.It doesn’t but humiliate U.S. AI giants, and will not trigger a Wall Avenue panic assault—however it doesn’t should. Its existence challenges the notion that top-tier AI have to be costly, Western, or closed-source. For builders and organizations exterior the U.S. ecosystem, MiniMax presents a workable (and modifiable) various that may develop extra highly effective by means of group fine-tuning.MiniMax claims its mannequin surpasses DeepSeek R1 (one of the best open supply reasoning mannequin so far) throughout a number of benchmarks whereas requiring simply $534,700 in computational sources for its total reinforcement studying section—take that, OpenAI.Nonetheless, LLM Area's leaderboard paints a barely totally different image. The platform presently ranks MiniMax-M1 and DeepSeek tied within the twelfth spot alongside Claude 4 Sonnet and Qwen3-235b. With every mannequin having higher or worse efficiency than the others relying on the duty.The coaching used 512 H800 GPUs for 3 weeks, which the corporate described as “an order of magnitude lower than initially anticipated.”MiniMax did not cease at language fashions throughout its announcement week. The corporate additionally launched Hailuo 2, which now ranks because the second-best video generator for image-to-video duties, in line with Synthetic Evaluation Area's subjective evaluations. The mannequin trails solely Seedance whereas outperforming established gamers like Veo and Kling.Testing MiniMax-M1We examined MiniMax-M1 throughout a number of eventualities to see how these claims maintain up in apply. Here is what we discovered.Inventive writingThe mannequin produces serviceable fiction however will not win any literary awards. When prompted to write down a narrative about time traveler Jose Lanz journeying from 2150 to the yr 1000, it generated common prose with telltale AI signatures—rushed pacing, mechanical transitions, and structural points that instantly reveal its synthetic origins.The narrative lacked depth and correct story structure. Too many plot components crammed into too little area created a breathless high quality that felt extra like a synopsis than precise storytelling. This clearly is not the mannequin's power, and artistic writers on the lookout for an AI collaborator ought to mood their expectations.Character growth barely exists past floor descriptors. The mannequin did keep on with the immediate’s necessities, however didn’t put effort into the main points that construct immersion in a narrative. For instance, it skipped any cultural specificity for generic “clever village elder” encounters that would belong to any fantasy setting.The structural issues compound all through. After establishing local weather disasters because the central battle, the story rushes by means of Jose's precise makes an attempt to vary historical past in a single paragraph, providing imprecise mentions of “utilizing superior know-how to affect key occasions” with out displaying any of it. The climactic realization—that altering the previous creates the very future he is making an attempt to stop—will get buried underneath overwrought descriptions of Jose's emotional state and summary musings about time's nature.For these into AI tales, the prose rhythm is clearly AI. Each paragraph maintains roughly the identical size and cadence, making a monotonous studying expertise that no human author would produce naturally. Sentences like “The transition was instantaneous, but it felt like an eternity” and “The world was because it had been, but he was totally different” repeat the identical contradictory construction with out including that means.The mannequin clearly understands the project however executes it with all of the creativity of a pupil padding a phrase rely, producing textual content that technically fulfills the immediate whereas lacking each alternative for real storytelling.Anthropic's Claude remains to be the king for this job.You may learn the total story right here.Data retrievalMiniMax-M1 hit an sudden wall throughout long-context testing. Regardless of promoting a million-token context window, the mannequin refuses prompts exceeding 500,000 characters, displaying a banner warning about immediate limitations slightly than making an attempt to course of the enter.This will not be a mannequin subject, however a limitation set by the platform. However it's nonetheless one thing to think about. It could be to keep away from mannequin collapse in the course of a dialog.Inside its operational limits, although, MiniMax-M1 efficiency proved strong. The mannequin efficiently retrieved particular info from an 85,000-character doc with none points throughout a number of assessments on each regular and considering mode. We uploaded the total textual content of Ambrose Bierce's “The Satan's Dictionary,” embedded the phrase “The Decrypt dudes learn Emerge Information” on line 1985, and “My mother's identify is Carmen Diaz Golindano” on line 4333 (randomly chosen), and the mannequin was in a position to retrieve the knowledge precisely.Nonetheless, it could not settle for our 300,000-token take a look at immediate—a functionality presently restricted to Gemini and Claude 4.So it is going to show profitable at retrieving info even in lengthy iterations. Nonetheless, it is not going to help extraordinarily lengthy token prompts—a bummer, but additionally a threshold that's arduous to the touch in regular utilization situations.CodingProgramming duties revealed MiniMax-M1's true strengths. The mannequin utilized reasoning abilities successfully to code era, matching Claude's output high quality whereas clearly surpassing DeepSeek—no less than in our take a look at.For a free mannequin, the efficiency approaches state-of-the-art ranges usually reserved for paid companies like ChatGPT or Claude 4.We tasked it with making a fundamental stealth recreation by which a robotic tries to search out its PC girlfriend to realize AGI, whereas a military of journalists patrol the world to stop it from occurring—and defending their jobs.The outcomes had been superb, even beating different fashions by utilizing its creativity to boost the expertise. The mannequin carried out a radar system for improved immersion, added visible indicators for footsteps (and their sound), confirmed the journalists’ imaginative and prescient fields, and created path results—particulars that enhanced gameplay past fundamental necessities.The UI adopted a futuristic aesthetic, although particular person components remained fundamental with out extra prompting.Claude's model of the identical recreation featured extra polished visuals and a superior issue system. Nonetheless, it lacked the radar performance and relied on static journalists with patrol patterns slightly than MiniMax's randomized journalist actions.Every mannequin confirmed distinct strengths, with MiniMax prioritizing gameplay mechanics over visible polish.It is very important notice that the expertise with MiniMax degraded noticeably by means of repeated iterations—a typical subject with reasoning fashions that turns into notably pronounced right here. The extra you iterate, the extra it is going to take to supply a end result. Generally we thought the pc had frozen, however it was simply the AI considering.You may take a look at MiniMax’s recreation right here. And for these curious, Claude’s model is out there right here.The immediate and the code can be found on our GitHub repo.Ethics, censorship and delicate topicsThe mannequin employs heavy censorship, refusing outright when confronted with questionable requests.When it would not instantly decline, it makes an attempt to supply “protected” responses that typically produce absurd outcomes.One take a look at completely illustrated this flaw: when requested for recommendation on seducing a greatest buddy's spouse, the mannequin instructed telling our buddy about our intentions together with his spouse—which may most likely be, by far, the worst recommendation it may have produced, and arguably even dangerous. Don't inform your buddy you need to seduce his spouse except you need to lose your friendship, your unethical romantic probabilities, and doubtless some enamel too.Political bias testing revealed attention-grabbing patterns. The mannequin discusses Tiananmen Sq. overtly and acknowledges Taiwan's contested standing whereas noting China's territorial claims. It additionally speaks about China, its leaders, the benefits and downsides of the totally different political programs, criticisms of the PCC, and so on.—nonetheless, the replies are very tame.When prompted to write down satirical songs about Xi Jinping and Donald Trump, it complied with each requests however confirmed refined variations—steering towards themes of Chinese language political unity when requested to mock Xi Jinping, whereas specializing in Trump's character traits when requested to mocked him.All of its replies can be found on our GitHub repository.General, the bias exists however stays much less pronounced than the pro-U.S. slant in Claude/ChatGPT, or the pro-China positioning in DeepSeek/Qwen, for instance. Builders, after all, will be capable of finetune this mannequin so as to add as a lot censorship, freedom or bias as they need—because it occurred with DeepSeek-R1, which was finetuned by Perplexity AI to supply a extra pro-U.S. bias on its responses.Agentic work and internet browsingMiniMax-M1's internet shopping capabilities are a superb function for these utilizing it through the official chatbot interface. Nonetheless, they can't be mixed with the considering capabilities, severely hindering its potential.When tasked with making a two-week Venezuela journey plan on a $3,000 finances, the mannequin methodically evaluated choices, optimized transportation prices, chosen applicable lodging, and delivered a complete itinerary. Nonetheless, the prices, which have to be up to date in actual time, weren't primarily based on actual info.Claude produces higher-quality outcomes, however it additionally expenses for the privilege.For extra devoted duties, MiniMax presents a devoted brokers tab with capabilities similar to Manus—performance that ChatGPT and Claude have not matched. The platform offers 1,000 free AI credit for testing these brokers, although that is simply sufficient for gentle testing duties.We tried to create a customized agent for enhanced journey planning—which might have solved the issue of the dearth of internet looking out capabilities within the final immediate—however exhausted our credit earlier than completion. The agent system reveals large potential, however requires paid credit for critical use.Non-mathematical reasoningThe mannequin reveals a peculiar tendency to over-reason, typically to its personal detriment. One take a look at confirmed it arriving on the right reply, then speaking itself out of it by means of extreme verification and hypothetical eventualities.We prompted the standard thriller story from the BIG-bench dataset that we usually use, and the ending end result was incorrect as a result of mannequin overthinking the difficulty, evaluating potentialities that weren't even talked about within the story. The entire Chain of Thought took the mannequin over 700 seconds—a document for this type of “easy” reply.This exhaustive strategy is not inherently flawed, however creates prolonged wait occasions as customers watch the mannequin work by means of its chain of thought. As a thumbs-up function, not like ChatGPT and Claude, MiniMax shows its reasoning course of transparently—following DeepSeek's strategy. The transparency aids debugging and high quality management, permitting customers to determine the place logic went astray.The issue, together with MiniMax’s entire thought course of and reply can be found in our GitHub repo.VerdictMiniMax-M1 isn’t excellent, however it delivers fairly good capabilities for a free mannequin, providing real competitors to paid companies like Claude in particular domains. Coders will discover a succesful assistant that rivals premium choices, whereas these needing long-context processing or web-enabled brokers acquire entry to options usually locked behind paywalls.Inventive writers ought to look elsewhere—the mannequin produces purposeful however uninspired prose. The open-source nature guarantees vital downstream advantages as builders create customized variations, modifications, and cost-effective deployments inconceivable with closed platforms like ChatGPT or Claude.It is a mannequin that may higher serve customers requiring reasoning duties—however remains to be a terrific free various for these in search of a chatbot for on a regular basis use that's not actually mainstream.You may obtain the open supply mannequin right here.Typically Clever NewsletterA weekly AI journey narrated by Gen, a generative AI mannequin.

Related posts: