Generative AI is the rage.
There is good reason for this. When we look at the advancement, it is mind numbing.
This may come as a surprise to many but I do not think we are even at the knee of the exponential curve. As stated other places, my view is we hit this with the next generation of LLMs. That means Grok3, Llama4, Claude4, and ChatGPT 5.0 will gap where we are now.
Of course, for the most part, these are text based systems although they are also being used for images. While Grok did get a bit of attention with Grok 2.0, it doesn't really compete with the image models.
For that realm, we have to look at Ideogram and, most recently, Flux. They are the ones leading that race.
Image generated by Ideogram
The Image Race Is Heating Up
Just like with text, we are seeing images advancing at an amazing pace. Midjourney was one of the early leaders, followed by Ideogram.
Flux got a lot of attention of late, the engine behind Grok. This means we are seeing a changing of the guard.
Or are we?
Ideogram just released version 2.0. This is what it had to say.
"Ideogram 2.0 significantly outperforms other text-to-image models across many quality metrics, including image-text alignment, overall subjective preference, and text rendering accuracy," the company said in its official announcement.
Personally, I don't put a lot of stock into the metric based rankings. I understand the need to quantify and I guess it is better to be leading than behind. However, the latter doesn't mean that it is worse than what is ranked ahead.
This is why I watch a lot of videos done by people who actually test this stuff out. They tell you the areas where they fall show.
That said, Ideogram coming out with an update was a long time coming.
So far, it is one of the better applications for text-to-image. I have only done some minor playing with the second version so I cannot accurately report on it.
The key here is it is moving ahead.
I did notice the speed is a bit better off the couple images I did generate.
18 Months From Now
It is easy to fall victim of looking at where things are at this moment.
With the pace of change in all things related to generation AI, just wait a little bit. I keep saying we are very early in this race and it is far from over.
Things are improving as measured by orders of magnitude instead of percentages. That is how much we are seeing. Between the amount of compute, algorithm improvements, and what some call unhobbling, the improvements are measured in terms of X (3X, 5X).
As we stretch the time frame out, we see orders of magnitude emerging. This means a 5X is a .5 OOM.
Here is a chart of the projected growth by an ex-OpenAI employee.
This is saying that, from compute and algorithmic efficiency, we are look at a 3-6 OOM. Basically, this equations to a 1,000 - 1 million times improvement from ChatGPT 4.o.
And that is before we take the unhobbling into consideration.
When we put all of this together, it is easy to see how we could be looking at a great deal more advancement over the next few years.
Text-to-image is going to follow a similar path. Even if the numbers are slightly lower, we will be impressed with what is possible.
The is a race that is just getting started.
Posted Using InLeo Alpha