Log of my adventure creating an AI Mortal Kombat player (round 1)

This is a starting point of a series about an adventure I want to share. This article explores an idea that came to my mind the last week and that I’m researching about. The idea is to create an expert AI player of Mortal Kombat based on reinforcement learning techniques.

Image for post
Image for post
Picture by Marlon Marçal on Behance

Note: On this series of articles I will carry out some research about something that, in my own view, it could work. So if you came here searching for answers, this is the wrong place, here we will find questions, many of them.


I came out with this idea after reading this awesome paper about agents learning to behave in complex environments written by OpenAI team. In a nutshell, the paper is about agents learning how to play hide and seek in a small space with walls and objects.

Video by OpenAI channel on YouTube

What amazed me the most about this research, was how the agents began to perform emerging complex strategies and even found some “hacks” in the environment in order to archieve their goals.

So I started to think on a project that involves this kind of A.I. agents and it’s emerging behaviors on something funny and interactive.

After search for a while for some inspiration, I found this cool JavaScript Mortal Kombat game.

Image for post
Image for post
Demo screenshot by the author

So… my idea is to create an agent that can play this Mortal Kombat game “decently” using reinforcement learning techniques.

Key concepts about this project

Reinforcement learning

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment.

In reinforcement learning, an artificial intelligence faces a game-like situation. The computer employs trial and error to come up with a solution to the problem.

To get the machine to do what the programmer wants, the artificial intelligence gets either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.

Image for post
Image for post
Image from Github by Nervana


Q-Learning is based on the notion of a Q function. The Q function (also known as the state action value function) of a policy pi, Q(s,a) , measures expected performance or discounted sum of rewards earned from the state s taking action a first and following the policy pi since then.

We define the optimal Q function Q*(s,a) as the maximum yield that can be obtained from observation s, taking action toand following the optimal policy thereafter.

Image for post
Image for post
Formula explanation from RandomAnt by Jake Bennett

Deep Q-Learning

For most problems, it is not practical to represent the function Q as a table containing values ​​for each combination of s Y a. Instead, We could train a function approximator, such as a neural network with parameters theta, to estimate the Q values, that is, Q(s, a; theta) to Q*(s,a). This can be done by minimizing the next loss at each step i.

Image for post
Image for post
Deep Q-Learning representation from this paper

Note: During this series I probably will add more concepts in this section, because sometimes you don’t know what you don’t know.

My notion of Deep Q-Learning

In simple terms, the agent play many, many games and repeatedly update the estimates of the Q-values as more information about rewards and penalties becomes available. This will eventually lead to good estimates of the Q-values, provided the training is numerically stable. By doing this, we create a connection between rewards and prior actions.

Understanding the problem


The first thing to figure out is what part of the game environment we are controlling. In this case the fighter actions.

In this game we have two kind of actions: movements and hits:


  • move_right
  • move_left
  • move_down
  • jump
  • jump_left
  • jump_right


  • high_punch
  • punch
  • kick
  • high_kick


The next thing is to figure out what the goal of the game is. In this case to harm your oponent until it loses all it’s health. So the reward is to harm your opponent as much as possible.

To avoid

Finally you need to learn what to avoid. In this case you must avoid dying by letting enemy harm you.

Motion trace

If we only use a single image from the game-environment then we cannot tell which direction the players are moving. The typical solution is to use multiple consecutive images to represent the state of the game-environment.

In this case, I want to try using game metadata (the position and health of each fighter). So the solution will be using multiple consecutive “states” of the game.


The state will be a data structure with the following variables:

  • self_position: {x: number; y: number}
  • enemy_position: {x: number; y: number}
  • self_health: number [0,1]
  • enemy_health: number [0,1]
Image for post
Image for post
Image by the author

Definition of the project


The goal of this project is to achieve a machine learning model that plays “decently” the Mortal Kombat game and then place it on a web page based on this one where anyone can play against the machine.


  • The first limitation is the processing, if I want to train the model from my vanilla computer, it would take days or weeks to obtain some results. In order to train correctly it’s necessary to use GPUs or TPUs.

Possible solution: Use Google Colab (they provide free GPUs and TPUs).

  • The second problem is that the “game environment” is written in JavaScript, and most of the frameworks and libraries to create and train machine learning models are written in Python and R. Therefore, it would be necessary to establish some type of environment transpilation.

Possible solution: Write a minimalistic Python environment.

  • Even if it is possible to train the model, then it would be necessary to import it into my JavaScript application to be able to execute it in the browser 😕.

Possible solution: Convert the model into Tensorflow.js and then import it from my JS app.


  1. Fork mk.js project into my Github.
  2. Adapt the project for my propose.
  3. Migrate the game environment from JS to Python.
  4. Develop an agent using Deep Q-learning techniques in Google Colab.
  5. Export and convert the agent in a Tensorflow.js model.
  6. Import the model in the JS app and setup the controls (to allow the agent to control one fighter).
  7. Deploy the app.

Step 1 and 2: Fork and adapt the mk.js project

These are probably the simplest steps in the project, because I just have to understand how the game code works and change the controls to be able to adapt it to my model (controling the agent programatically instead of using the keyboard).

After cloning the project, reading the code and doing some experimentation I understood how it works and I deleted some unnecessary code, here you can check the repo.


I’m not sure if what I’m doing is completely correct, there may be simpler solutions, but this is my first step in reinforcement learning, and there is no better way to learn than making mistakes. My goal here is to learn doing (or trying to do) somethig cool.

In the next article, I will talk about the step 3 and he results, I hope you find this project interesting!

Written by

Machine Learning and Frontend padawan, on the open source side of the force.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store