My name is Roy Ward. I specialize in writing highly optimized code.
I have expertise in optimization, algorithms, data structures, compilers, microprocessor architecture, assemblers, operating systems, number representation, concurrency and compression. I have over 20 years of full-time experience building high performance and low memory solutions, resulting in 12 US and 7 international software patents.
I have a deep understanding of what is going on at all levels, from digital circuitry, CPU design, assembly, higher level languages, data structures, all the way to compilers and operating systems, and I apply this multi-level understanding to everything I do. I also bring to bear my strong mathematics skills which help direct me towards a highly efficient solution to whatever problem I am working on.
I am currently seeking interesting and challenging work using these skills.
This is a portfolio website for some of the projects that I have been involved with. Like everything in life, it is a work in progress.
Work and views presented here are my own.
Contact Linkedin Resume Curriculum Vitae Useful/Interesting External Links
Galaxy Generator Project - generate and navigate around a 400 billion star galaxy, on Android, Linux and Windows. Pre-alpha downloads now available for Android, Linux and Windows. Version 0.1.2 has just been released.
Open source projects#
Pseudo-Double Random Variate Poisson
Blog#
Note About blog postings: I plan to release a new post approximately every 2-3 weeks while I am between jobs, although I expect the frequency to reduce once I am employed. Here are some notes about how I select topics. Next up: ‘Lists of Lists of Tiny Things’.
Introduction A little over three years ago, I had an idea - wouldn’t it be fun to have a whole procedurally generated universe on my phone, and have an open game where I can explore parts of it with no combat? This post describes some of the design decisions that I went through to start turning this idea into reality, and some of the consequences of those decisions.
I’m writing this post three years after the design process, and while it is easy to find out what the available options would be now, it is sometimes a little difficult to determine what the state of a particular software package or license was at the time, particularly as most web content is undated....
I’ve taken a break from blogging on optimization topics for a bit to get the Galaxy Generator into a releasable state - it is nowhere near feature complete, but the parts that I have done work fairly well. I’ve now got the rendering part somewhere close to my vision, and version 0.1.2 is available for download here for Android, Linux and Windows (Android will give some warnings about side-loading and self-signing, and Windows Defender will warn you about potential malware - there isn’t any)....
Introduction I had a problem. The Galaxy Generator worked fine when I was moving around inside the Galaxy, but it seems important that I should be able to move outside it, turn the brightness up, and see the whole thing at once. When I tried that, the frame rate would drop to nearly zero and the machine would then run out of memory. Here was the sort of image that I was looking for:...
Introduction While building my Galaxy Generator, one of the performance bottlenecks was in the generation of Poisson random variates to determine how many stars should be placed inside a voxel. This post looks at how to write a fast cross-platform Poisson random variate generator.
First I motivate this by writing about why use Poisson random variates at all, then I explore the floating point case (using double) which gives some good results, but may not reliably generate the same results cross-platform....
Introduction I have been working for some time on a Galaxy Generator project that procedurally generates and allows navigation in a 400 billion star galaxy. Stars are generated/removed as they enter/leave visual range. One of the performance bottlenecks is generating that data (which can be millions of stars per second) and getting it to the GPU for rendering. This post investigates doing that efficiently.
First I look at how a star is modeled and what information needs to be sent to the GPU....
Introduction This post continues documenting the process of optimizing a small problem - generating images of the Mandelbrot Set. This builds on what I wrote in part 1 and gets to speeds of up to 130x the naive case. Again, this is mostly meant as an exercise to demonstrate how I go about using my knowledge of software design and computer architecture to optimize software.
This post will be divided into four sections: a brief recap and some notes on the feedback I got from the last post, a section on micro-optimization where I get a small (~1....
Introduction This post documents the process of optimizing a small problem - generating images of the Mandelbrot Set. Although I will show a speedup of ~8x over the naive implementation (there is a part 2 that will bring this to ~100x in some cases), this is mostly meant as an example of how I go about using my knowledge of software design and computer architecture to optimize software. This post is rather long, but many of the sections are independent of each other, so feel free to use the Contents to go to the sections of interest....