There's something to gain in the fact that after the release of every LLM(multimodal or not :P), the most excitement from people comes from visual demos. Even in pure text models, the things people seem to be most excited about are, how well the model can interpret and represent shapes, albeit through a python or javascript interface. Technically speaking, some of these models don't even have the ability to iterate through the output and refine code. But as viewers, when a model can "create a bouncing ball inside a shape shifting tetrahedron", the common(dare I say midwit) consensus is that the model has gotten better!
Visual LLM Demos Catch the Fish
February 28, 2025