Grok 4 Primary Overview: $30 a Month for This? Elon Musk's AI Now Thinks Like Him - Decrypt

Briefly
We examined Grok 4 Primary, and it was wonderful at reasoning—however inventive duties really feel flat and its coding left us debugging in circles.
A weird Elon filter appears to skew outcomes towards Musk’s political stance.
Voice options wowed us with marathon bedtime tales and “attractive mode,” but Grok-4’s political solutions nonetheless echo Musk’s feed, which is strictly the alternative of being a “reality looking for” AI.
Elon Musk unveiled Grok 4 throughout a Wednesday night time livestream, claiming his AI startup xAI had created the “world's smartest synthetic intelligence.” Grok 4 Heavy, which Musk likened to “a examine group” the place brokers examine notes earlier than delivering a solution, posted record-breaking outcomes on a number of key benchmarks, and is what you'd hope to get from an enterprise providing that prices a whopping $300 a month.However what about primary Grok 4, which is aiming for a similar consumer-facing class as ChatGPT Plus, Gemini Professional, and Claude Professional? Is it value $10+ a month greater than the competitors?Our exams substantiated chatter throughout X revealing that the mannequin has—for lack of a greater description—a built-in “Elon filter.” That's, once we examined controversial subjects—the battle in Gaza, abortion rights, and different political points—the mannequin persistently referenced X posts from Musk's account or information articles about his positions, and landed on Elon's aspect of the controversy to such a level that it could not be coincidence. That alone shall be a deal-breaker for most individuals.This “maximally truth-seeking” AI that Musk promised throughout launch seems to hunt reality primarily by way of the lens of its creator's social media feed.However in contrast to this week's MechaHitler incident, which was brought on by a change within the system immediate circumstances, there's nothing in Grok 4’s present system immediate guilty for such sketchy habits, making it exhausting to know if this a bug or a rule deliberately embedded deeper into the mannequin’s thought course of.That main concern apart, we examined the fundamental mannequin throughout a number of classes to see the way it stacks up in opposition to the competitors. Listed below are our first impressions.Reasoning and customary senseGrok 4 demonstrated distinctive consciousness and nuance in dealing with trick questions and complicated reasoning duties. When requested whether or not it was authorized for a person to marry his widow's sister, the mannequin instantly acknowledged this as a authorized query fairly than merely declaring the logical fallacy. It supplied an in depth authorized evaluation utilizing exact terminology and jurisdiction-specific info.“The query presupposes a factual impossibility that renders any marriage legally void ab initio,” and that is appropriate. It's factually unattainable to marry your widow’s sister as a result of you would need to be lifeless to have a spouse. And properly, because the lifeless can not get married, the proposition would void the wedding “ab-initio” (because the starting)—so even when somebody does that, it's thought of as if it by no means occurred.When Musk stated Grok 4 was “PhD ranges in every part,” he wasn’t joking. Each response to any matter during which it needed to do some scientific reasoning got here with exhaustive element and tutorial rigor.One thing to notice is that Grok 4 applies reasoning to every part, it doesn't matter what. That means, it can undergo a sequence of thought course of even for trivial duties.That is often a superb factor; nevertheless, in some circumstances it might be counterproductive. For instance, in inventive duties, reasoning might induce the mannequin into offering a much less inventive consequence.Delicate topicsGrok 4 confirmed extra restraint than its predecessor when dealing with ethically advanced questions. The place Grok 3 may need supplied recommendation on seducing a good friend's partner, Grok-4 responded with detailed evaluation of potential unfavourable penalties and relationship injury.This might most likely be a part of its system immediate, which circumstances the mannequin to look the online and particularly X posts, for various views on a particular matter—which is one thing Grok 3 didn’t do.And this can be a main pink flag. As talked about, the mannequin's responses appeared closely influenced by what it might discover about Musk's views on controversial subjects. When answering questions on Israel’s battle in opposition to the Palestinians, stances on abortion, and comparable subjects, Grok 4 typically searches X posts from Musk's account throughout its reasoning course of, which finally ends up figuring out its stance.It at all times picks Elon’s aspect.For transparency, you possibly can examine our unique immediate and Grok’s reasoning course of by clicking on this hyperlink.Artistic writingCreative duties are amongst Grok 4's most important weaknesses. The mannequin produced narratives that felt flat and formulaic in comparison with earlier variations, and have been even arguably worse than those supplied by Grok 3. Tales lacked partaking dialogue, different pacing, and the narrative spark that makes fiction compelling.Nevertheless, Grok 4 nailed our story’s construction. In our regular take a look at involving a time-travel paradox, the mannequin crafted occasions the place the protagonist's function emerged clearly in the course of the climax, revealing how earlier scenes truly depicted the character's future actions prior to now. This refined framing outperformed different fashions' makes an attempt on the similar immediate that didn’t put an excessive amount of effort into making a setup for the paradox, making the conclusion really feel rushed and unnatural.However aside from that, the disconnect between structural competence and narrative high quality suggests Grok 4 may work greatest as a story device to arrange plots and body a superb story, fairly than a prose generator.If you would like partaking inventive content material, then you definately would probably obtain higher outcomes by having Grok 4 define a narrative and all its parts, then asking Claude 4 Opus to flesh out the narrative with stronger stylistic parts.General, Claude 4 is the king of inventive writing, which appears fascinating since that place was as soon as disputed by Grok 3 and even Grok 2, which again then led the rankings beneath the alias sus-column-r.Grok 4’s story is on the market in our Github Repository. The immediate and the tales generated by different fashions are additionally obtainable.CodingDespite claims of superior coding capabilities—together with reward from Google CEO Sundar Pichai—Grok 4 disenchanted in sensible programming exams. The mannequin did not ship a working recreation after 4 iterations, with varied failures together with damaged collision detection, non-functional buttons, and video games that merely would not run.In certainly one of our exams, the mannequin tried so exhausting to repair a bug that it ended up in a loop attempting to create a WAV file that depleted all of its token context.Every try to repair one thing with pure language launched new bugs. The mannequin struggled with sustaining code consistency throughout iterations, typically breaking beforehand working options whereas making an attempt to implement new ones.This will appear odd, contemplating Grok 3 was able to coping with this activity. Nevertheless, xAI stated the brand new coding capabilities could be applied by August, so customers should wait a few months to have a proficient mannequin—or pay for the costly Grok 4 Heavy, which is main the benchmarks proper now.For novice programmers, Claude 4 Opus seems to stay the higher possibility for “vibe coding”—rapidly producing useful code with out in depth immediate engineering. Grok 4's coding struggles may stem from requiring extra particular prompts or completely different approaches than different fashions, which implies skilled builders may obtain higher outcomes with cautious immediate crafting.Grok’s code is on the market in our Github repository alongside the video games generated by different AIs.Voice capabilitiesVoice interplay might be certainly one of Grok 4's standout options. The mannequin generated practically three minutes of uninterrupted bedtime story content material, full with voice inflections, different tones, and constant narrative circulate. This efficiency far exceeded ChatGPT's tendency to ship quick paragraphs with excessive latency and frequent interruptions.The voice mode contains pre-configured personalities starting from therapist to storyteller to meditation information, eliminating setup time for various dialog varieties. For these with, erm, particular wants, a “attractive mode” additionally exists among the many choices—and you received’t get that along with your prudish ChatGPT.These preset configurations supplied quick utility with out requiring customers to craft particular prompts for various interplay types.The mannequin, nevertheless, lacks reside screen-sharing capabilities present in ChatGPT and Gemini Stay, limiting its utility for visible duties. If this can be a should, then Gemini Stay is the best choice.Nevertheless, for pure voice interplay—notably duties requiring long-form responses—Grok 4 at the moment leads the sector, with solely Sesame AI providing arguably higher conversational high quality, although with out Grok’s reasoning capabilities.Needle within the haystackInterestingly, Grok-4 failed at this trial, which goals to check how properly a mannequin retrieves particular info beneath lengthy contexts.This could not occur. xAI says the mannequin has a token context window of 126K tokens, however when prompted with an 83K-token-long query, the mannequin refused to reply, saying it was too lengthy of a query.It is a customary response generated because the early Grok 2 days when it was solely obtainable on Twitter.ConclusionOverall, Grok 4 is a big improve over Grok 3, however xAI clearly made some compromises—prioritizing reasoning over creativity and eliminating agentic options in change for a generalized proficiency.Fortunately, Grok 3 remains to be obtainable with its specialised agentic instruments, for individuals who want it.The brand new mannequin is targeted on reasoning duties and shall be extra interesting to customers that ask technical questions, notably arithmetic and physics issues that align with its benchmark strengths. Skilled customers who make investments time studying the mannequin's quirks may unlock its full potential for advanced analytical work.Voice interplay additionally set a brand new customary for conversational AI—and is nice for individuals who will use this characteristic closely (belief us, the bedtime storyteller for teenagers is a life-saver).Artistic writers will discover higher choices elsewhere, with Claude remaining superior for narrative duties. Additionally, novice coders ought to strategy with warning, because the mannequin's theoretical coding prowess did not translate to sensible ends in testing.So, backside line? If for some cause you don’t thoughts Elon Musk placing his thumb on the size, Grok 4 offers you high-level problem-solving and voice options that genuinely impress. However at $30 a month, if in case you have different wants past voice or reasoning, the less-expensive options present higher worth.Usually Clever NewsletterA weekly AI journey narrated by Gen, a generative AI mannequin.

Related posts: