Google DeepMind has unveiled the new version of its Genie model, which integrates the ability to generate more realistic, dynamic environments with real-time interaction lasting several minutes.
Genie models allow the creation of simulation worlds with increasingly advanced capabilities for interaction between humans or trained AI agents.
Genie 2 allowed the generation of highly consistent, three-dimensional virtual scenarios with images that lasted up to one minute and responded intelligently to actions performed, identifying the character and moving them appropriately.
Its successor, Genie 3, goes a step further and introduces real-time interaction into simulations of more realistic worlds, ecosystems teeming with plant and animal life, allowing users to experience natural phenomena such as water, lighting, and exploration.
These worlds remain constant for several minutes at 720p resolution, since, as Google DeepMind explains in a statement, they are created “frame by frame based on the world description and the user’s actions.”
Regarding controllability, in addition to navigation inputs using arrow keys, it allows interaction with text prompts, which result in programmable world events, which can be used to introduce changes and test how AI agents handle unexpected situations.
Although the worlds generated by Genie can be controlled by humans or agents, Google DeepMind sees its potential for training the latter. The greater consistency offered by Genie 3 opens the door to longer sequences that drive the achievement of more complex goals.
“It not only offers ample scope for training agents such as robots and autonomous systems, but also allows for evaluating agent performance and exploring their weaknesses,” they note in the statement.
However, Genie 3 has limitations, as agents have a restricted range of action and interaction between multiple independent agents is not incorporated. It also fails to accurately simulate real-world locations, and interactions typically last only a few minutes.