Neural Pulse
Posts
Amazon's Nova Act, a Leap Forward Web Agents

Amazon's Nova Act, a Leap Forward Web Agents

Amazon introduces Nova Act, an AI model designed to autonomously perform web-based tasks, from online shopping to scheduling.

Andres Vourakis
April 01, 2025

Hey there 👋

We hope you're excited to discover what's new and trending in AI, ML, and data science.

Here is your 5-minute pulse...

print("News & Trends")

Nova Act: Amazon’s Leap Toward Autonomous Web Agents (9 min. read)

Image source: Amazon

Amazon introduces Nova Act, an AI model designed to autonomously perform web-based tasks, from online shopping to scheduling. Available as a research preview, developers can utilize the Nova Act SDK to build agents capable of navigating web browsers, executing complex workflows, and integrating with APIs. Already enhancing Alexa's capabilities, Nova Act marks a significant step toward creating reliable AI agents for complex, multi-step tasks.

Runway’s New Gen-4 Video Model (4 min. read)

Image source: Runway

Runway’s Gen-4 is a next-gen video generation model with major upgrades in fidelity, consistency, motion, and control. It supports character and object persistence, multi-view coherence, and responds to text, image, or video prompts—no extra fine-tuning needed. The model is now available via API and Runway’s web app.

ARC Prize 2025: A Million-Dollar Quest to Build Smarter AI (15 min. read)

Image source: ARC

The ARC Prize Foundation has launched ARC-AGI-2, a benchmark designed to challenge AI reasoning systems while remaining straightforward for humans. Concurrently, ARC Prize 2025 is now open on Kaggle, offering over $1 million in prizes for developing efficient, general AI systems capable of surpassing ARC-AGI-2.

Gemini 2.5: Google’s Most Advanced Multimodal Model Yet (3 min. read)

Image source: Google

Google just released Gemini 2.5. This new model introduces advanced reasoning capabilities, enabling the AI to process tasks step-by-step for more accurate responses to complex prompts. This multimodal model excels in understanding text, audio, images, video, and code, with significant improvements in coding performance and a forthcoming 2 million token context window.

print("Applications & Insights")

Understanding the Tech Stack Behind Generative AI (22 min. read)
A clear, practical breakdown of the generative AI tech stack—from foundation models and multimodal inputs to vector databases, orchestration tools, and AI agents. Covers infrastructure, ethics, and emerging capabilities in one go. Great primer if you want to build, scale, or just understand how modern AI actually works.

Breaking Data Silos: Building a Unified MCP Text-to-SQL Server with Denodo (12 min. read)
By combining Denodo’s AI SDK with the Model Context Protocol (MCP), this approach creates a unified server where LLMs can query multiple enterprise data sources through natural language. It breaks down data silos, boosts accuracy in text-to-SQL translation, and streamlines enterprise AI integration—no custom APIs or pipelines needed.

A Fun and Easy Guide to run LLMs via React Native on your Phone!
Hugging Face's guide demonstrates how to build a React Native app that runs large language models (LLMs) directly on mobile devices. Utilizing the llama.rn binding for llama.cpp, the tutorial enables users to download and interact with models locally, ensuring privacy and offline functionality.

print("Tools & Resources")

TRENDING MODELS

Text Generation
deepseek-ai/DeepSeek-V3-0324
⇧ 86.6k Downloads
A state-of-the-art language model designed for efficient and coherent text generation. It excels at generating long-form text with strong contextual understanding.

Any-to-Any
Qwen/Qwen2.5-Omni-7B
⇧ 53k Downloads
A versatile multimodal model that handles both text and vision tasks. It performs competitively across a wide range of benchmarks.

Text Generation
manycore-research/SpatialLM-Llama-1B
⇧ 12.6k Downloads
SpatialLM is designed for layout-aware document understanding. It combines textual and spatial reasoning to improve performance on structured documents.

Image-Text-to-Text
ds4sd/SmolDocling-256M-preview
⇧ 58k Downloads
SmolDocling is a lightweight document AI model built for fast, cost-efficient document processing. It supports layout-aware inference on OCR outputs.

Text-to-Image
ByteDance/InfiniteYou
⇧ 505 Downloads
A generative model that creates images from textual prompts. It focuses on producing aesthetically pleasing and coherent visual content.

TRENDING AI TOOLS

⚡ Warp AI: AI-powered terminal with natural language commands and code assistance.
🛠️ Taylor: Build automations and extract insights from unstructured text.
🚀 HeroUI: Generate beautiful apps regardless of your design experience

That’s it for today!

Before you go we’d love to know what you thought of today's newsletter to help us improve the pulse experience for you.

What did you think of today's pulse?

Your feedback helps me create better emails for you!

See you soon,

Andres