The Value of Synthetic Data And Web 3.0

in #hive-1679222 months ago

We spoke a fair bit about data over the last few months. This is something that is becoming increasingly important. It is a shift that people involved in Web 3.0 need to be aware of.

Within the technology realm, many are wondering about the value of synthetic data. This is data that is machine generated. We touched upon it a few times but it is worthy of some deeper exploration.

The challenge is, right now, the synthetic data that is generate resides on closed servers. Each time we do a prompt on a ChatGPT or Gemini, that data is generated for that company.

This is simply another twist on the old social media model. Users are generating more data for Big Tech. The front end is different yet it produces the same results.

It gets compounded when people actually have "chats". A conversation provides human data mixed with the synthetic. To me, this is the Holy Grail as the knowledge graphs have to be strengthened.

Which brings up to the value of synthetic data. We have to consider there might be enormous value there and, if so, what is required of Web 3.0.


Image generated by Ideogram

Web 3.0 And Synthetic Data

The present debate within technology circles is whether synthetic data, that generated by machines, is valuable.

One of the theories is that the synthetic data will degrade over time. For this reason, many conclude that model training will asymptote over time.

Here is a short video with Mark Zuckerberg discussing what their revelations were when training Llama.

https://inleo.io/threads/view/taskmaster4450le/re-leothreads-28v81tnmd?referral=taskmaster4450le

Another important idea coming from Zuckerberg is his view that, in the future, inference generating synthetic data will feed the models for training.

This means that people, through their using of these AI features, are going to provide the data for the development of more advanced models.

It does appear the more things change, the more they stay the same.

We are once again back to users providing Big Tech with the fundamental components to generate billions.

This is where Web 3.0 needs to change things.

One of the core principles of Web 3.0 is decentralization. That is the exact opposite of enormous centralized entities that experience powerful network effects. If left unchallenged, we will end up with a handful of companies that run the digital world.

In other words, a replication of Web 2.0.

If synthetic data is so important to the future, which Zuckerberg believes, then it is something we should pay attention to.

Sadly, most simply add to Big Tech's databases without a second thought. The idea of doing something to open up the data generated when doing a ChatGPT prompt never enters the mind of people.

AI Services

Web 3.0 is lacking in services overall. This is becoming an even bigger problem in the era of rapidly advancing AI.

Big Tech is all over this. They are progressing ahead at a pace that is never seen.

Jensen Huang, the CEO of NVIDIA, was speaking at the recent Salesforce conference. He said that AI advancement was "Moore's Law squared".

His estimation is that Moore's Law produces roughly a 100x over a decade. AI, in his view, is coming in at around a 100,000x.

If he is even close to being accurate, we have to pay attention to what is going on here. This is crucial for the future. Corporations such as Google and OpenAi are all over this. So are the Microsofts and Amazons of the world.

Where is Web 3.0?

If we want decentralization, that means having the AI services built that can generate the data. Everything, it appears, is boiling down to data. If Zuckerberg is correct, the amount available to these technology companies will be enormous.

After all, OpenAI has tens of millions of people feeding it everyday.

What does Web 3.0 have?

Outside of people chasing the next MemeCoin, what are people doing? While they are distracted with hopes of green candles, understanding that most are going to end up with nothing, the true Great Data Race is on.

Human generated data is crucial. However, it just might be that synthetic data is also of the utmost importance.


What Is Hive

Posted Using InLeo Alpha

Sort:  

This web2 genius shouldn't be underestimated. When we taught they'll be difficulty in data fetching, who knew their main idea of training these AI's with LLM. AI will in turn be querying humans to fetch more human data. The big question remains just as you've said, where is web3 from here?

Now you've hit it! We use it quite a lot in the modeling and simulation domain, i.e., to generate more data with AI than we would otherwise. Of course, real+synthetic data is used for AI learning and, ultimately, for prototyping.
Currently, such a process is mainly used for drone development. I'd rather not say anything more about it...

I think that synthetic data will have a great deal of value. However, the epitome will be human interaction with the data.

This is an advantage that chatbots and social media sites will have.

It's true. But it is already being done. AI can generate thousands of times more data than is actually generated, helping to better understand not only the synthetic world but also the real world.