Consultant and co-owner at Binomial LLC, game and open source developer, graphics programmer, lossless data and texture compression specialist, and recovering OpenGL addict. Worked previously at Unity Technologies, Valve, Microsoft Ensemble Studios, and Digital Illusions.
Monday, May 12, 2014
The Truth on OpenGL Driver Quality
The driver landscape is something that any practicing GL dev must face unless you like having only a fraction of potential customers able to enjoy your product. (These are the drivers you'll have to work with in order to actually ship a product today or within the next year or so. If you're just a dev playing at home with one driver you'll probably not have to deal with any of this gritty real-world stuff.)
If all you've ever done is use D3D then you better strap yourself in because the available GL drivers for Windows/Linux are all over the map. Here's my current opinion on driver quality:
What most devs use because this vendor has the most capable GL devs in the industry and the best testing process. It's the "standard" driver, it's pretty fast, and when given the choice this vendor's driver devs choose sanity (to make things work) vs. absolute GL spec purity. Devs playing at home use this driver because it has the sexiest, most fun to play with extensions and GL support. Most of what you hear about the amazing things GL will be able to do in order to compete against D3D12/Mantle are by devs playing with this driver. Unfortunately, we can't just target this driver or we miss out on large amounts of market share.
Even so, until Source1 was ported to Linux and Valve devs totally held the hands of this driver's devs they couldn't even update a buffer (via a Map or BufferSubData) the D3D9/11-style way without it constantly stalling the pipeline. We're talking "driver perf 101" stuff here, so it's not without its historical faults. Also, when you hit a bug in this driver it tends to just fall flat on its face and either crash the GPU or (on Windows) TDR your system. Still, it's a very reliable/solid driver.
Vendor A supports a zillion extensions (some of them quite state of the art) that more or less work, but as soon as you start to use some of the most important ones you're off the driver's safe path and in a no man's land of crashing systems or TDR'ing at the slightest hickup.
This vendor's tools historically completely suck, or only work for some period of time and then stop working, or only work if you beg the tools team for direct assistance. They have enormous, perhaps Dilbert-esque tools teams that do who knows what. Of course, these tools only work (when they do work) on their driver.
This vendor is extremely savvy and strategic about embedding its devs directly into key game teams to make things happen. This is a double edged sword, because these devs will refuse to debug issues on other vendor's drivers, and they view GL only through the lens of how it's implemented by their driver. These embedded devs will purposely do things that they know are performant on their driver, with no idea how these things impact other drivers.
Historically, this vendor will do things like internally replace entire shaders for key titles to make them perform better (sometimes much better). Most drivers probably do stuff like this occasionally, but this vendor will stop at nothing for performance. What does this mean to the PC game industry or graphics devs? It means you, as "Joe Graphics Developer", have little chance of achieving the same technical feats in your title (even if you use the exact same algorithms!) because you don't have an embedded vendor driver engineer working specifically on your title making sure the driver does exactly the right thing (using low-level optimized shaders) when your specific game or engine is running. It also means that, historically, some of the PC graphics legends you know about aren't quite as smart or capable as history paints them to be, because they had a lot of help.
Vendor A is also jokingly known as the "Graphics Mafia". Be very careful if a dev from Vendor A gets embedded into your team. These guys are serious business.
A complete hodgepodge, inconsistent performance, very buggy, inconsistent regression testing, dysfunctional driver threading that is completely outside of the dev's official control. Unfortunately this vendor's GPU is pretty much standard and is quite capable hardware wise, so you can't ignore these guys even though as an organization they are idiots with software. Basic stuff like glTexStorage() crashes (on a shipped title) for months on end with this driver. B's driver devs try to follow the spec more closely than Vendor A, but in the end this tends to do them no good because most devs just use Vendor A's driver for development and when things don't work on Vendor B they blame the vendor, not the state of GL itself.
Vendor B driver's key extensions just don't work. They are play or paper extensions, put in there to pad resumes and show progress to managers. Major GL developers never use these extensions because they don't work. But they sound good on paper and show progress. Vendor B's extensions are a perfect demonstration of why GL extensions suck in practice.
This vendor can't get key stuff like queries or syncs to work reliably. So any extension that relies on syncs for CPU/GPU synchronization aren't workable. The driver devs remaining at this vendor pine to work at Vendor A.
Vendor B can't update its driver without breaking something. They will send you updates or hotfixes that fix one thing but break two other things. If you single step into one of this driver's entrypoints you'll notice layers upon layers of cruft tacked on over the years by devs who are no longer at the company. Nobody remaining at vendor B understands these barnacle-like software layers enough to safely change them.
I've occasionally seen bizarre things happen on Vendor B's driver when replaying GL call streams of shipped titles into this driver using voglreplay. The game itself will work fine, but when the GL callstream is replayed we'll see massive framebuffer corruption (that goes away if we flush the GL pipeline after every draw). My guess: this driver is probably using app profiles to just turn off entire features that are just too buggy.
Interestingly, Vendor B has a tiny tools team that actually makes some pretty useful debugging tools that actually work much of the time - as long as you are using vendor B's GPU. Without Vendor B's tools togl and Source1 Linux would have taken much longer to ship.
This could be a temporary development, but Vendor B's driver seems to be on a downward trend on the reliability axis. (Yes, it can get worse!)
On the bright side, and believe it or not, Vendor B knows the OpenGL spec inside and out - to the syllable. If you can get them to assist you, their advice is more or less reasonable about plain GL matters (not extensions).
Vendor C - Driver #1
It's hard to ever genuinely get angry at Vendor C. They don't really want to do graphics, it's really just a distraction from their historically core business, but the trend is to integrate everything onto one die and they have plenty of die space to spare. They are masters at hardware, but at software they aren't all that interested really. They are the leaders in the open source graphics driver space, and their hardware specs are almost completely public. These folks actually have so much money and their org charts are so deep and wide they can afford two entirely different driver teams! (That's right - for this vendor, on one platform you get GL driver #1, and another you get GL driver #2, and they are completely different codebases and teams.)
Anyhow, this vendor's HR team is smart: it directly hires open source wiz kids to keep driver #1 plodding forward. This driver is the least advanced of the major drivers, but it more or less works as long as you don't understand or care what "FPS" means. If it doesn't work and you're really motivated you can git your hands dirty and try to fix it and submit a patch. If you're really good at fixing this driver and submitting patches then you may get a job offer from this vendor.
Anyhow, driver #1 is unfortunately pretty far behind on the GL standard, but maybe in 1-2 years they'll catch up and implement the spec as of last year. But you can't ignore this driver because they have a significant and strategically growing market share. So as a developer who wants to reach this market, you can't afford to use those fancy extensions or the latest trendy "modern" GL supported by vendors A and B. You must do a min() operation across all the drivers and in many cases this driver gates what you can do.
Vendor C has no GL tools at all for either platform. Sorry - want to debug that graphics problem you're having? Welcome to 1999.
Vendor C - Driver #2
A complete disaster. This team's driver is barely used by any titles because GL on this platform is totally a second class citizen, so many codepaths in there just don't work. They can't update a buffer without massive, random corruption. This team will do stuff like give you a different, unique, buggy driver drop for every title in your back catalog for perf analysis or testing. This team will honestly ask you if "perf" or "correctness" is more important.
I've seen one well-known engine team spend over a year attempting to get their latest GL 4.x+trendy extensions backend working at all on this team's driver. Hey guys - this driver just doesn't work, just move on already and implement a plain GL 3.x backend with workarounds (just like togl and other shipping titles do today).
On the bright side, Vendor C feeds this driver team more internal information about their hardware than the other team. So it tends to be a few percent faster than driver #1 on the same title/hardware - when it works at all.
In addition to the above major drivers, there are several open source drivers, mostly developed by the community, for hardware from vendors A and B. They tend to be behind the times from a GL perspective, but I hear they mostly work. I don't have any real experience or hard data with these drivers, because I've been fearful that working with these open source/reverse engineered drivers would have pissed off each vendor's closed source teams so much that they wouldn't help.
Vendor A hates these drivers because they are deeply entrenched in the current way things are done. These devs have things like mortgages and college funds (or whatever) to keep funding, so there's a massive amount of inertia from this camp. There's no way they are going to release their Top Secret GPU Specs to the public, or (gasp!) open source their driver. Vendor A will have to jump on the open source driver bandwagon soon in order to better compete against Vendor C's open model, whether they like it or not.
Vendor B halfheartedly helps their open source driver by funding a tiny team to keep the thing working. At some point, the open source driver for Vendor B's GPU may be a more viable path forward then their half-functional closed source driver.
To ship a major GL title you'll need to test your code on each driver and work around all the problems. May the "GL Gods" help you if you experience random GPU corruption, heap corruption, lockups, or TDR's. Be very nice to the driver teams and their managers/execs, because without them your chances aren't nearly as good.