6 mins read

Google’s DeepMind Launches Gemini Pro and Ultra: Will It Outperform ChatGPT?

DeepMind, backed by Google, has recently introduced Gemini in a long video, showing its smooth and multimodal capabilities, referred as Gemini Ultra. Moreover, A less powerful version of Gemini, named Gemini Pro, is now available in Bard for English-speaking users in 170 countries! As of February 1, 2024, Bard can now generate images in most of the countries, powered by Google’s Imagen 2 model.

What are we talking about? Gemini is a multimodal AI system developed by Google’s DeepMind that can understand and respond to natural language instructions and visual inputs. So you can use audio, text and video to interact with Gemini, something that Large Language Models (LLMs) like ChatGPT cannot do, unless they are assisted by extensions or plugins.

The Video

The video demonstrates Gemini’s capabilities through a series of experiments, including: Following instructions: Gemini can follow directions that involve both language and visual cues. For example, it can be instructed to select a specific object from a group of objects, or to navigate through a virtual environment using instructions like “turn left” and “go forward.” Understanding natural language: Gemini can understand a variety of natural language prompts and questions, including those that are open ended, challenging, or strange.

For example, it can be asked to generate creative text formats, like poems, code, scripts, musical pieces, email, letters, etc. Generating text: Gemini can generate different creative text formats of text content, like poems, code, scripts, musical pieces, email, letters, etc. It can also translate languages, write different kinds of creative content, and answer your questions in an informative way.

The Reactions

The most interesting experiment shown in the video is the one where Gemini is asked to write a poem about a picture. Gemini is able to generate a poem that is both relevant to the picture and creative.

Another impressive example is the one where Gemini starts a game after a world map is provided: by using emojis, the user has to guess the country on the map, and Gemini will tell whether the correct country has been chosen or not. It’s rather smooth and natural from the video, the applications of this could be interesting.

All of this suggests that Gemini Ultra has a deep understanding of both language and visual information natively, being a true multimodal model. Let’s all hope that it gets better for Bard, which in its current form, cannot generally compete with the capabilities of ChatGPT. Despite Google’s high hopes, their Bard didn’t capture the spotlight as expected. It seems that it didn’t manage to reach the level of success or the quality benchmarks many had hoped for, but now that Bard is powered by Gemini Pro, things might completely change.

The Controversy

Shortly after the release, the response to the video hasn’t been entirely positive. Released on December 7, the six-minute video quickly amassed 2.1 million views on YouTube; later Google DeepMind’s Oriol Vinyals pointed out that the video’s user prompts and outputs were “shortened for brevity.” Moreover, Gemini’s interactions were conducted via text, not voice, and the process was lengthier than the video suggests. Acknowledging this, Google included a disclaimer in the YouTube video upload stating, “For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.”

Overall, the video provides a fascinating introduction to Gemini — but currently it’s just a video. The real test will be when people will start using it, which currently seems to be a possibility for many using Bard – with the lightweight version, Gemini Pro, that can outperform ChatGPT 3.5 in some tasks . Will Google finally manage to dethrone ChatGPT?

Available to Developers

Google also made available Gemini APIs for developers, as well as Imagen 2 to generate images. They will be free to use until they are moved to General Availability status.

The first version of Gemini Pro is now available through the Gemini API, and it’s shaping up to be a direct competitor to OpenAI’s APIs. Here’s what you need to know:

  1. Performance Excellence: Gemini Pro stands out from other AI models of its size, delivering top-notch results on various research benchmarks. This means better accuracy and efficiency for your projects.
  2. Enhanced Context Window: The current version boasts a 32K context window for text processing. This is just the beginning, as future updates promise to expand this capacity even further.
  3. Accessibility and Pricing: Gemini Pro is free to use for now, subject to certain usage limits. And don’t worry about costs down the line – it’s going to be priced competitively.
  4. Feature-Rich Platform: This isn’t just a one-trick pony. Gemini Pro comes loaded with a suite of features including function calling, embeddings, semantic retrieval, custom knowledge grounding, and chat functionality. This versatility makes it a valuable tool for a wide range of applications.
  5. Global Language Support: Catering to a global audience, Gemini Pro supports 38 languages and is available in over 180 countries and territories. This makes it an ideal solution for international projects.
  6. Text and Vision Capabilities: At launch, Gemini Pro handles text inputs and outputs with finesse. Plus, there’s a dedicated Gemini Pro Vision multimodal endpoint that can process both text and imagery.
  7. Development Made Easy: To help you integrate Gemini Pro into your applications seamlessly, SDKs are available for various platforms including Python, Kotlin (for Android), Node.js, Swift, and JavaScript. This ensures that you can use Gemini Pro regardless of your preferred programming environment.

Gemini Pro and Gemini Pro Vision are currently available for free to developers through Google AI Studio, until the models become generally available early next year.

Gemini’s Introduction

Bard and Gemini Pro

One thought on “Google’s DeepMind Launches Gemini Pro and Ultra: Will It Outperform ChatGPT?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.