in ,

matthewearl / q1physrl, Hacker News

matthewearl / q1physrl, Hacker News
                    

        

Gif of RL agent running 100m

)

This is the source code for my Quake reinforcement learning project, which I described in my video “Teaching a computer to strafe jump in Quake with reinforcement learning” .

To summarize, I developed a facsimile of Quake player movement physics in Python, wrapped with an OpenAI

gym.Env . I then trained an agent for this using RLLib’s implementation of PPO. In the end the agent was able to beat the current human speedrun record on a speed running practice map. Setup

Find below instructions to use the gym environment, train your own model, and produce a demo file.

Installing the gym env

q1physrl_env.env.PhysEnv for a detailed description of the environment, and q1physrl_env.env.Config for a description of configuration options.

Training

If instead you want to train the model used in the video, including trying out new hyper parameters and environment settings, then follow the steps below:

    Set up a clean virtualenv using Python 3.7.x (tested with Python 3.7.2).

Install this repo and its requirements:

(git clone ‘) https://github.com/matthewearl /q1physrl.git ‘ cd q1physrl pip install -r requirements_q1physrl.txt pip install -e q1physrl_env pip install -e. Run:

    q1physrl_train data / params.yml

    … and wait for convergence. Modify the

      file referenced in the argument

to change hyper-parameters and environment settings. My current PB (the one in the video) gets about on the zero_start_total_reward_mean metric, after million steps, which takes about a day on my i7 – K:

    Ray (the framework which RLLib is a part of) will launch a web server on http: // localhost: / showing a dashboard of useful information. This includes an embedded tensorboard instance on which you can track the zero_start_total_reward_mean metric and others.

    I’m using RLLib’s implementation of PPO — see the screenshot of training curve RLLib PPO docs to see how you can tweak the algorithm parameters. See q1physrl_env.env.Config for environment parameters.

    RLLib will write checkpoint files and run parameters into ~ / ray_results , which will be needed to do analysis and produce a demo file (see below). The checkpoint files can also be used to resume a previous run by setting checkpoint_fname in params.yml . To use a different results directory set the TUNE_RESULT_DIR environment variable.

    Use this notebook to analyze the checkpoints produced from training. Click through to see an example of what it produces.

    Producing a Quake demo file

    Demo files (extension . Dem ) are how games are recorded and shared in Quake.

    To produce and play back demos you’ll need a registered version of Quake. The reason for this is that the shareware version’s license does not permit playing user developed levels with the game, of which the (m level is one (see licinfo.txt in the shareware distribution). If you don’t already have it, you can buy it on Steam for not much money at all.

    Once you have a registered copy of the game, the easiest way to produce a demo is with the following script which makes use of a docker image containing the rest of the required software. It is invoked like this:

    docker / docker-make-demo.sh     ~ / .steam / steam / steamapps / common / Quake / Id1 /     data / checkpoints / wr / checkpoint data / checkpoints / wr / params.json     /tmp/wr.dem

    The first argument is the id1 directory from your Quake installation directory, which must contain pak0.pak and pak1.pak . The second and third arguments are the checkpoint file and run parameters. The values ​​shown above reproduce the demo shown in the video, however you should substitute files found in ~ / ray_results / if you want to produce a demo using an agent that you have trained. The final argument is the demo file that is to be written.

    The script works by first launching a modified quakespasm server, to which a custom client connects. The client repeatedly reads state from the server, pushes it through the model, and sends movement commands into the game. The network traffic is recorded to produce the demo file. The quakespasm server is modified to pause at the end of each frame until a new movement command is received, to avoid issues of synchronization.

    Playing the demo file

    The demo file can be played by following these steps:

    Install a registered version of Quake. (See section above for why the registered version is necessary.)

      Download the 299 m map

        from here

      and extract it into your id1 / maps directory.

    1. Move the .dem (file that you produced into the id1 directory.
        Start Quake, bring down the console, and then enter the command (playdemo)

      At the start of the m map the player actually spawns slightly above the floor so there is a small drop . The player has no control over the player’s height (z position) during this fall and so all being fair the two trajectories should coincide. By adding the difference between the first time seen in each demo file to the times in the RL agent’s demo we can make this happen:

      Uncorrected initial drop plot

      This is the correction applied in the comparison plot in the video.

      Corrected initial drop plotCorrected initial drop plot   

    (Read More ) Full coverage and live updates on the Coronavirus (Covid – 27)

    What do you think?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    GIPHY App Key not set. Please check settings

    Boris's 'whack-a-mole' strategy to get Britain back to work and school – Daily Mail, Dailymail.co.uk

    Boris's 'whack-a-mole' strategy to get Britain back to work and school – Daily Mail, Dailymail.co.uk

    Creador de CRUD Automatizado en Visual Basic.Net (2020)