• Neural Pulse
  • Posts
  • OpenAI Releases New API Based Audio Models And More....

OpenAI Releases New API Based Audio Models And More....

OpenAI has introduced new audio models, including gpt-4o-transcribe and gpt-4o-mini-transcribe for speech-to-text tasks...


Hey there 👋

We hope you're excited to discover what's new and trending in AI, ML, and data science.

Here is your 5-minute pulse...

print("News & Trends")

Image source: OpenAI

OpenAI has introduced new audio models, including gpt-4o-transcribe and gpt-4o-mini-transcribe for speech-to-text tasks, and gpt-4o-mini-tts for text-to-speech applications. The gpt-4o-transcribe model demonstrates improved Word Error Rate (WER) performance over existing Whisper models across multiple established benchmarks, reflecting significant progress in speech-to-text technology. These models are now available through OpenAI's API, enabling developers to create more accurate and customizable voice applications.

Image source: Pruna AI

Pruna AI is open-sourcing its powerful model compression framework, making AI optimization more accessible. By combining techniques like pruning, quantization, and distillation, it helps developers shrink models while maintaining performance. With enterprise offerings and a soon-to-launch compression agent, Pruna AI aims to revolutionize efficiency—saving companies time, money, and computing power

Image source: Hugging Face

Hugging Face’s updated Inference Endpoints analytics provides real-time insights into latency, error rates, request volume, and token usage. Users can drill into per-replica logs, filter by status codes, and monitor over custom time ranges with auto-refresh. It’s built on OpenTelemetry and Grafana for deeper debugging and observability.

Image source: Gretel

Nvidia has acquired synthetic data startup Gretel to enhance AI training for developers, signaling a major push into AI-generated datasets. While synthetic data promises scalability and privacy benefits, experts warn of risks like model collapse. Despite concerns, Big Tech is embracing synthetic data, blending it with real-world data to fuel AI advancements.

Image source: Google

Google is rolling out powerful new AI features to Gemini Live, enabling real-time screen reading and live video analysis via your phone’s camera. These Astra-powered upgrades put Google ahead of Alexa and Siri in the AI race, offering instant assistance for tasks like color selection and on-screen queries.

print("Applications & Insights")

Build Your Own AI Coding Assistant in JupyterLab with Ollama and Hugging Face (8 min. read)
Turn JupyterLab into an AI coding assistant using Ollama and Hugging Face Transformers—all locally. This guide walks you through setup, integration, and running models like Mistral or Llama 2 for private, fast, and intelligent code help without relying on cloud APIs.

Build AI agents using MongoDB Atlas
MongoDB’s Generative AI Showcase repo features practical examples (18 Jupyter Notebooks) of RAG, AI agents, and industry use cases. It shows how MongoDB powers GenAI apps as a vector store, memory provider, and operational DB—ideal for both beginners and advanced builders.

Benchmarking Our Path to AGI: Measuring AI Progress in 2025 (13 min. read)
This article explores how AI progress is benchmarked in 2025, highlighting the rise of models like OpenAI’s o3, the limitations of existing evaluations, and the push for more robust, future-ready metrics to meaningfully track advancements toward AGI

print("Tools & Resources")

TRENDING MODELS

Image-Text-to-Text
ds4sd/SmolDocling-256M-preview
⇧ 28k Downloads
A 256-million-parameter model preview aimed at document understanding tasks, facilitating extraction and summarization of information from textual documents.

Image-Text-to-Text
mistralai/Mistral-Small-3.1-24B-Instruct-2503
⇧ 60k Downloads
A 24-billion-parameter model designed for instruction-following tasks, excelling in generating coherent and contextually relevant text responses.

Text-to-Speech
sesame/csm-1b
⇧ 32k Downloads
A 1-billion-parameter model for converting text to natural-sounding speech, supporting multiple languages with high fidelity.

Text Generation
manycore-research/SpatialLM-Llama-1B
⇧ 3.1k Downloads
A 1-billion-parameter language model optimized for spatial language understanding, useful in applications requiring comprehension of spatial relationships.

Text-to-Speech
canopylabs/orpheus-3b-0.1-ft
 ⇧ 22k Downloads
A 3-billion-parameter fine-tuned model for high-quality text-to-speech synthesis, emphasizing expressiveness and clarity.

TRENDING AI TOOLS

  • 🖼️ KREA AI: A new level of control for AI video generation

  • 🧠 Mind Maps: Instantly turn sources into visual mind maps in NotebookLM.

  • 🌐 Web Search: Claude can now browse the internet to answer questions about current events and retrieve recent information.

  • 🎧 OpenAI.fm: An interactive demo for developers to try the new text-to-speech model in the OpenAI API

That’s it for today!

Before you go we’d love to know what you thought of today's newsletter to help us improve the pulse experience for you.

What did you think of today's pulse?

Your feedback helps me create better emails for you!

Login or Subscribe to participate in polls.

See you soon,

Andres