Neural Pulse
Posts
Grok 4 Early Benchmarks Are Impressive

Grok 4 Early Benchmarks Are Impressive

Grok 4’s leaked benchmarks show a potential leap in reasoning: it scored 45 % on Humanity’s Last Exam...

Andres Vourakis
July 08, 2025

Hey there 👋

We hope you're excited to discover what's new and trending in AI, ML, and data science.

Here is your 5-minute pulse...

print("News & Trends")

Grok 4 Benchmarks Leak With 45% Score on Humanity Last Exam (1 min. read)

Image Source: TestingCatalog

Grok 4’s leaked benchmarks show a potential leap in reasoning: it scored 45 % on Humanity’s Last Exam, 24 points higher than Gemini 2.5 Pro, alongside strong results on GPQA (87–88 %), AIME’25 (95), and SWE‑Bench (72–75 for the code variant). These figures, if accurate, mark Grok 4 as state‑of‑the‑art in reasoning, math, and coding, and may spur faster advancements from other AI labs. However, these are unauthorised leaks and not official.

New Gemini Tools for Students and Educators (4 min. read)

Image source: Google

Google launches Gemini for Education at ISTE 2025, bringing premium Gemini 2.5 Pro AI tools (like classroom assistants, quiz generation, video and audio content creation, and shareable “Gems”) to all Workspace for Education users for free, with robust data protection and admin controls. Students soon gain access to personalized quizzes and interactive visuals, while educators can deploy AI-powered assessments and materials. Supervision features, youth-safe policies, and privacy safeguards ensure responsible deployment in schools.

China’s Baidu Unveils Open-Source Multimodal Family (12 min read.)

Image source: Baidu

Baidu has open sourced its powerful ERNIE 4.5 family, which includes 10 large scale multimodal models ranging from 0.3B dense to 424B parameter MoE architectures, under Apache 2.0. Leveraging a novel heterogeneous Mixture of Experts design, these models handle text and vision tasks with shared and modality specific parameters, delivering state of the art performance across multiple benchmarks. Baidu also released ERNIEKit and FastDeploy toolkits for efficient fine tuning and deployment across hardware platforms.

print("Applications & Insights")

Context Engineering for Agents (11 min. read)
This article discusses how context engineering enhances AI agents by dynamically assembling system prompts, user data, memory, tools, and retrieval components. Moving beyond static prompts, it enables agents to adapt in real time, reduce hallucinations, and reliably handle complex workflows through structured, modular context management.

The American DeepSeek Project (10 min. read)
The U.S. risks losing its edge in open‑source AI as Chinese labs dominate model releases, datasets, and research. Nathan Lambert’s “American DeepSeek Project” proposes building a fully open model (including weights, code, data, logs) on par with frontier systems within two years. Funded by a coalition of advocates and requiring $100–500 M, it aims to restore Western leadership and trustworthiness in AI before closed-source giants and Chinese models set the ecosystem’s direction.

MCP Is Not Good Yet (Video)
David Cramer of Sentry dives into Multi-Cloud Platform (MCP) for B2B SaaS, highlighting its role in integrating AI for bug fixing. He shares Sentry's journey, emphasizing OAuth 2.1 complexities and the need for agents over raw API exposure. The key takeaway: design for agent interaction, prioritize remote MCP for security, and manage context costs. Building MCP is accessible, but requires continuous refinement.

print("Tools & Resources")

TRENDING MODELS

image-to-image
black-forest-labs/FLUX.1-Kontext-dev
⇧ 171 k Downloads
A text-to-image model producing coherent visual scenes from prompts; part of the FLUX.1 series by Black Forest Labs. Trained for high-quality image generation.

image-text-to-text
THUDM/GLM-4.1V‑9B‑Thinking
⇧ 10.1 k Downloads
A multimodal model that accepts images and text to generate coherent textual responses. Developed on a 10B‑parameter architecture for image‑integrated reasoning.

image-text-to-text
google/gemma‑3n‑E4B‑it
⇧ 223 k Downloads
Google’s Gemma model designed for multimodal (image + text) tasks, generating text based on visual inputs. Covers Italian language use-cases (hence the “it”).

text-to-speech
kyutai/tts‑1.6b‑en_fr
⇧ 12.2 k Downloads
A high-quality bilingual English–French TTS system with a 1.6B‑parameter backbone for natural voice synthesis. Supports fluent speech in both languages.

text generation
tencent/Hunyuan-A13B-Instruct
⇧ 27.9 k Downloads
An 80B‑parameter instruction‑tuned model from Tencent designed for general-purpose text generation. Optimized for following user instructions fluently.

code generation
apple/DiffuCoder‑7B‑cpGRPO
⇧ 599 Downloads
A 7B‑parameter transformer model by Apple focused on generating code snippets from natural language prompts. Targets efficient and clean code synthesis.

TRENDING AI TOOLS

🧩 Context: A secure AI office suite that gathers chats, spreadsheets, and files to auto-generate presentation-ready deliverables.
🎙️ Kyutai TTS: Super low-latency, streaming text-to-speech that starts reading as text is generated, ideal for live LLM outputs.
💎 Gems: Custom or pre-made AI “experts” in the Gemini side-panel for Docs, Sheets, Slides, Gmail—making repeated tasks faster.
⚡ Shortcut: A next-gen Excel agent that boosts compliance and data accuracy at scale.

That’s it for today!

Before you go we’d love to know what you thought of today's newsletter to help us improve the pulse experience for you.

See you soon,

Andres