Neural Pulse
Posts
OpenAI Unveils the GPT-4.1 Family

OpenAI Unveils the GPT-4.1 Family

OpenAI introduces the GPT-4.1 family, a new series of GPT models featuring major improvements on coding, instruction...

Andres Vourakis
April 16, 2025

Hey there 👋

We hope you're excited to discover what's new and trending in AI, ML, and data science.

Here is your 5-minute pulse...

print("News & Trends")

OpenAI’s Dev-Focused GPT-4.1 Family (18 min. read)

Image source: OpenAI

OpenAI released GPT-4.1, a new suite of models—GPT-4.1, Mini, and Nano—designed for better instruction following, coding, and long-context reasoning (up to 1 million tokens). GPT-4.1 improves significantly on benchmarks like SWE-bench and MultiChallenge while reducing costs. The Mini and Nano models offer faster, more affordable options, with Nano optimized for lightweight tasks like classification and autocomplete. These models are available exclusively via API.

Seaweed-7B: ByteDance’s Powerful New AI for Video Generation (5 min. read)

Image source: ByteDance

ByteDance has unveiled Seaweed-7B, a 7-billion-parameter diffusion transformer model for AI video generation. Trained with 665,000 H100 GPU hours, it delivers features like synchronized audio-video generation, long-shot storytelling, real-time high-resolution output, and dynamic camera control. Seaweed-7B matches or surpasses larger models in performance while being more resource-efficient.

DolphinGemma: Google’s AI Breakthrough in Dolphin Communication (5 min. read)

Image source: Google

Google's DolphinGemma is an AI model designed to decode dolphin vocalizations by analyzing patterns in their clicks and whistles. Developed with the Wild Dolphin Project and Georgia Tech, it runs on Pixel phones, enabling real-time analysis and interaction. The model aims to facilitate two-way communication with dolphins, potentially bridging the gap between human and dolphin languages.

Google Launches Agent Development Kit for Building Multi-Agent AI Systems (9 min. read)

Image source: Google

Google has introduced the Agent Development Kit (ADK), an open-source framework aimed at simplifying the creation of multi-agent AI applications. Unveiled at Google Cloud NEXT 2025, ADK provides tools for building, interacting with, evaluating, and deploying agents, facilitating the development of intelligent, autonomous systems

print("Applications & Insights")

Improving Pinterest Search Relevance Using LLMs (7 min. read)
Pinterest improved search relevance by using a cross-encoder LLM as a teacher to predict multi-class relevance, then distilled its knowledge into a smaller student model for real-time inference. They enriched text features using diverse Pin metadata and combined this with semi-supervised learning, boosting both feed relevance and fulfillment rates.

Sesame Speech Model: How This Viral AI Model Generates Human-Like Speech (9 min. read)
Learn how Sesame leverages HuBERT for self-supervised audio representation, a diffusion decoder for waveform generation, and a transformer to map tokens to speech—offering insights into training on 100K hours of audio and why this open-source model is redefining realistic voice synthesis.

Building Deep Research Agent from scratch (15 min. read)
This guide shows how to build a Deep Research Agent from scratch by leveraging LLMs, web search integration, and iterative reflection steps. It explains how to design the system's state using Python dataclasses, plan report outlines, enrich content via automated searches, and format the final report in Markdown, and offers a practical blueprint for advanced data processing and agent orchestration.

print("Tools & Resources")

TRENDING MODELS

Text Generation
agentica-org/DeepCoder-14B-Preview
⇧ 13K Downloads
A 14B parameter model fine-tuned for code reasoning using distributed reinforcement learning, achieving 60.6% Pass@1 on LiveCodeBench v5. It offers performance comparable to OpenAI's o3-mini with fewer parameters.

Text-to-Image
HiDream-ai/HiDream-I1-Full
⇧ 16K Downloads
A 17B parameter open-source model that generates high-quality images from text prompts within seconds. It employs advanced diffusion techniques for state-of-the-art image synthesis.

Image-Text-to-Text
moonshotai/Kimi-VL-A3B-Thinking
⇧ 11K Downloads
An efficient Mixture-of-Experts vision-language model activating only 2.8B parameters for multimodal reasoning. Excels in tasks like image captioning and visual question answering.

Text Generation
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
⇧ 12K Downloads
A 253B parameter model optimized for reasoning and human-like chat, supporting 128K token contexts. Designed for efficient inference with reduced GPU requirements.

Image-Text-to-Text
meta-llama/Llama-4-Scout-17B-16E-Instruct
⇧ 657K Downloads
An instruction-tuned model integrating visual and textual inputs to produce detailed responses. Ideal for complex tasks requiring multimodal comprehension.

TRENDING AI TOOLS

🧠 Claude Research: Lets Claude search the web and your Google Workspace to answer questions.
📬 Notion Mail: An AI-powered inbox that writes and organizes emails for you.
🤖 Appsmith Agents: Build AI agents to automate tasks in your apps.
🏢 AI HQ: Manage and deploy AI agents across your enterprise.

That’s it for today!

Before you go we’d love to know what you thought of today's newsletter to help us improve the pulse experience for you.

What did you think of today's pulse?

Your feedback helps me create better emails for you!

See you soon,

Andres