GPT-4o, OpenAI’s latest flagship language model, announced by OpenAI’s CTO Mira Murati, is a multilingual, multimodal generative pre-trained transformer. It was released on May 13, 2024. Unlike its predecessors, GPT-4o processes text, vision, and audio inputs end-to-end, making it a versatile model.
It combines advanced reasoning capabilities with the ability to handle multiple data types, including images and audio. GPT-4o is free, but ChatGPT Plus subscribers enjoy a higher usage limit. Its rapid audio input response is comparable to human-like interaction, making it a powerful tool for various applications.
GPT-4o offers a substantial improvement in speed, with an average response time of 320 milliseconds, closely mirroring human conversational response times. It matches the performance of GPT-4 Turbo on English text and code, while showing marked improvements in handling non-English languages. Additionally, GPT-4o is 50% cheaper to use via the API and supports 5x higher rate limits compared to GPT-4 Turbo.
A key feature of GPT-4o is its ability to directly process audio and visual inputs without requiring separate transcription or computer vision models. This capability enhances the model’s understanding and responsiveness to multimodal inputs, paving the way for more natural and engaging human-computer interactions.