Yes, I support (some) AI use in gaming

By the way on the topic of LEWM, casual models, and this thread.

First attempt using the world model LeWM to play Super Mario Bros. No RL. No hand-written reward. Just pixels from 122 gameplay traces!

It chases a target scene and stomps a Goomba 🍄on the way, choosing actions live.

Some solutions LeWM finds are genuinely surprising💡!… pic.twitter.com/4By1cpH5m6
— L Wang (@0xShug0) March 27, 2026

Note this is a model that can be trained on a single GPU and has only 15M params.

https://arxiv.org/abs/2603.19312

Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's latent space encodes meaningful physical structure through probing of physical quantities. Surprise evaluation confirms that the model reliably detects physically implausible events.

Of course this isn't a generative model nor a traditional VLM. Just wanted to point out that casual DL models are a thing that actual people are researching and implementing.

Existing User Log In

New User Registration

Gaming - Yes, I support (some) AI use in gaming - View Post

Recent Badges: