Teaching AI to Play Chrome Dino Game: Reinforcement Learning

pravesh0 in #hive-106817 • 12 days ago

I didn't know which community I should put this post in, thought about Hive Gaming but this post is a bit more technical, and then decided to go for SteemGeeks but the post turned out to be not so technical. Caught between the two, GeekZone fits perfectly, I suppose. It will be my first post in this community as well.

AI playing Chrome Dino, only did 2 Hours of Training

For the past few days, I have been working on a small AI project. This fits perfectly with my New Year's resolution to use my programming skills and relearn all the things I was passionate about just a few years ago.

I am a Computer Science graduate and when I was in college, I used to research all the new technologies and around those days I found out about Machine Learning and Artificial Intelligence. These were just the technologies of the 'future' back then and I had fun playing with some of the Machine Learning algorithms back in those days.

Now that everyone is using the term AI, even if they don't have the slightest clue what it does. Anyway, I digress. I was keen on learning some reinforcement learning models and what better way to learn them while having some fun too?

So, let's try to teach our AI to play a simple web-based game like Chrome's Dino. Just type this chrome://dino in your browser and it opens up. I bet you have seen it before.

How does AI learn?

But wait, how does the AI learn to play the game? Well, there is an interesting, we can use something called Reinforcement Learning. Think of it like training a dog. You give the AI some treat if it does the task correctly and punish it for doing it wrong. By playing games over and over they will eventually learn to do the task which gives them maximum rewards. That's what a reinforcement learning model is. You can go into Deep Neural Networks and weights and loss functions etc, but for now, this is enough.

Now, let's go a bit deeper on how it works.

You feed observational data to your Network (Screenshot of the game, so it receives the data as pixels on the screen)

Action and Rewards

You define inputs it can do for each step, for our simple game (it will be something like: no action (our agent does nothing), press down a key, press space.
Assigns rewards... for my model, I gave a reward of +1 for every step it stays alive, a small positive reward (+0.1) for doing nothing, and a huge negative reward for losing the game (-10).

Then it is time to set some more parameters like how long you want your model to train, how frequently you want your model to update the model, what would be the learning rate etc.

Once this is done, you let it play games over and over and just wait and see it learn new techniques and strategies. At this stage, you carefully monitor the learning process and adjust some parameters if required.

It took me a long time to figure out a lot of things in the code. The worst part for me was to optime the code as much as I can so the learning is not too slow. You have a trade-off between model quality and training time.

At the start of the training, my goal was to see the AI reach a score of at least 500 in the game. I wasn't aiming for anything above that because this is a fast-paced game and you need to make quick decisions. With my computational limitation, it was a fair goal to aim at.

I made a little video showing how the AI learns to play this game. I saved my models at various positions so I can go back and play those and see how they perform concerning the total training time.

The results were certainly interesting.

Weakest model, spamming random inputs

The first model at 5000 steps or after 25 seconds of training was just as trash as you would expect. The AI just tries a bunch of random inputs to see what rewards they get. And for the set of inputs or experiences that gave it the best rewards, they would be used to tweak the model gradually and that's how it will learn.

With only about 25 minutes of training which took 30,000 individual steps (actions), the agent quickly learned to jump over the cactus. It wasn't consistent but you could see that it learned to jump just before it saw a cactus approaching. How cool is that? All we gave was a set of pixel information and the model had to figure out how to extend its life in the game so it could get the maximum rewards and only less than 25 minutes in, it learned to jump over some of the obstacles.

After 40+ minutes of Training

After a few more minutes at around 41 minutes, it becomes more consistent with the above approach. It could easily reach scores of more than 200 but this is the time when the game speeds up and the later models too a lot of time to adjust to this speed change. This was the first time when we saw the AI encounter a bird and didn't know what to do. It just slammed right into it.

This increased game speed and the addition of obstacles like birds made it recalculate its approach and for the next one hour or so, it didn't learn much. Much it was just exploring and finding all the possible solutions to encounter this new thing in their environment?'

Decreasing in performance after some point, Case of Exploration-Exploitation Dilemma?

That's exactly what happened when we got the breakthrough at around 160,000 Step models (trained for more than 2 hours). This was the first time when it started to jump over some of the birds. It wasn't consistent enough and the learning became quite slow at this point. Maybe the significant increase in game speed was throwing it off and it was learning at a lot slower rate than before. This is natural for models like this as well.

This model broke through the 500 score mark and is already playing better than a kid. Just imagine if the model was not running at 20 fps and was learning at 60 fps and I had a lot of time to train it and test it. It could easily beat me if I train it long enough. I might not do it because it will be quite slow and time-consuming. But I achieved what I wanted to do with this project.

My best-performing model, learnt to deal with some birds too (a bit inconsistent)

Why and What else in Future?

Well, my main goal was to learn more about Reinforcement learning and having some hands-on approach. I can confidently say that seeing your model learn right in front of your eyes is one of the best feelings I got in my short coding career. Can't wait to try this approach with more complex games.

I have a really messy Jupyter Notebook filled with a lot of debugging code and other craps that will confuse the living heck out of everyone not just the AI chatbots. I took the help of some online AI models for debugging and some of them work better than others. The best free one was the DeepSeek latest model, it solved an issue that even Gemini couldn't but their servers are mostly busy so that sucks.

I will clean up the code and put some comments and upload it to my Github once I feel it is somewhat presentable xD. It will help me save my code for future reference. Just because I tend to forget things I already learnt.

#ai #gaming #rl #chrome-dino #indiaunited #neoxian #vyb #pob #pimp

12 days ago in #hive-106817 by pravesh0

10.48 VYB

Sort:

Trending

[-]

mateodm03 12 days ago

I think the community is fine to publish this content, as long as it is about technology.
I found it super interesting how you managed to train the AI to play google dinosaur better haha. Someday I will delve into the topic of AI's, I am still quite inexperienced in the subject.

0.28 VYB

2 votes

[-]

pravesh0 12 days ago

That's what I thought.

Talking about AI, it just blew up in recent years. I always knew it is going to be a big thing, I and a friend started learning it together. He continued on and now became a good machine learning engineer in this field. I didn't want to go in that direction even after knowing its potential. Call me stupid but I play by my rules.

(If I had made that decision, I might have been much better financially but perhaps a lot worse in general. So, I chose a different path and I am loving this, no regrets...lol)

0.00 VYB

1 vote

[-]

mateodm03 11 days ago

The important thing is that you did what you felt haha. The economic part is not always everything.

Maybe someday I'll pay for some training to learn more about AI. I had read that you used deepseek. I tried one of their models on a VPS and it really wasn't that good, I think it still has some way to go to beat openAI.

0.00 VYB

[-]

pravesh0 11 days ago

Interesting...which model did you use exactly, there are some distilled models too. Was it the larger 671 billions parameter model or not?

0.00 VYB

[-]

mateodm03 10 days ago

Deepseek R1 i think

0.00 VYB

[-]

pravesh0 9 days ago

R1 has multiple deployable model ranging from 1.5 billion parameters (weak) which even I can run on my system to 671b parameters model (needs 32 GB VRAM GPU and ~400 GB Storage). This one is the strongest, but takes a lot more resources to deploy. We just got a gaming GPU with that much VRAM 5090. $2K for a GPU is insane though 🤪

https://ollama.com/library/deepseek-r1:671b

0.00 VYB

1 vote

[-]

mateodm03 9 days ago

We did it on a VPS with low specs, but it took too long to develop a response and gave false information. Perhaps it is because of the components

0.00 VYB

[-]

mateodm03 9 days ago

I have now checked the 7B model I had used.

ahmedhayat 11 days ago

That's something interesting now... But beware to never reach the ending as I already have seen it.

pravesh0 11 days ago

This model can't, it is very basic and runs sequentially so it won't reach anywhere close to the end.

0.00 VYB

[-]

hivepakistan 11 days ago

Curious about HivePakistan? Join us on Discord!

Delegate your HP to the Hivepakistan account and earn 90% of curation rewards in liquid hive!

50 HP

100 HP

200 HP

500 HP (Supporter Badge)

1000 HP

Follow our Curation Trail and don't miss voting!

Additional Perks: Delegate To @ pakx For Earning $PAKX Investment Token

Curated by ahmedhayat

0.00 VYB

1 vote

[-]

incublus 11 days ago

It's an interesting experiment man, I wouldn't have thought that the AI would have such a hard time playing this game because it's so easy for a human. And if you were to run this experiment for 1-3 days, do you think it could be perfected?

0.23 VYB

2 votes

[-]

pravesh0 11 days ago

Ideally, the more training it undergoes, the more it learns. But to perfect it I might have to feed it more information like its current distance from the nearest obstacle, maybe the number of obstacles in the scene, its vertical height, maybe a parameter for the changing game speed etc.
That will make it observe a lot of things every step and will improve it a lot. But as it gets complex, it needs more calculations per step and needs much more computational power, which my system doesn't have. Also, all the past experiences have to be stored in memory and for a large model you might need tens of GB of free RAM or VRAM if not 100s of GB.

But in my current state, I can train it for a few more hours and see if it learns more. But I stopped the training because it was getting slow. Also, if it wasn't a web game, you can run 100s of games in parallel and the training time will be much shorter. But again, you need more processing power to run all of those simultaneously.

0.09 VYB

1 vote