Could you explain "deduplicating data" and why more AAA games don't do this? Helldivers 2 was reduced from 150 GB to 23 GB. With how large AAA games are nowadays, it seems odd that this isn't more of a priority.
In the world of computers, it takes varying amounts of time to read data from a particular source. Certain data sources are much faster to read from than others. Here's the general hierarchy for read times:
Cache hit (1-20 nanoseconds)
Not in cache, read from memory (15-20 nanoseconds)
Not in memory, read from SSD (25-100 microseconds, i.e. 25,000-100,000 nanoseconds)
Not in memory, read from hard disk (5-10 milliseconds, i.e. 5,000,000-10,000,000 nanoseconds)
Not in memory, read from blu-ray disc (100+ milliseconds, i.e. >100,000,000 nanoseconds)
As you can see, cache and memory reads are really fast but reading from either SSD or HDD are extremely slow in comparison. HDD reads are orders of magnitude slower than reading from SSDs, which are also orders of magnitude slower than reading from memory. Reading from physical media like a blu-ray disc is even slower than that. This time cost is incurred whenever the game needs to read data that isn't in memory, e.g. when loading a new area, a new enemy, a new character, effect, etc. from the disk.
One of the tricks for optimizing load times is to minimize the amount of "seek time" it takes for the slower devices like HDDs and blu-ray discs by having redundant copies of the same data on the disk. This is like having multiple kitchens set up around the office instead of just in the kitchen - workers don't have to walk as far to get a snack. It isn't important if the office is small, but having only one kitchen when your office houses a thousand people across ten floors means a pretty massive bottleneck at mealtimes. Multiple redundant kitchens will speed things up because workers just need a kitchen and they will generally prefer the closest available one. Multiple redundant chunks of data will make seek times faster because they're all the same and the reader doesn't have to go as far to get what it's looking for.
Having redundant options comes at the cost of disk space. Having multiple drink stations or kitchens means that space can't be used for a lounge or game stations or storage or more offices or anything else. Not every game does this because they don't always have that option - if the game features a lot of super high res textures and high quality audio, those assets will eat up a ton of disk space on their own even without duplication for access speed. This is a choice the Helldivers dev team made to help speed up access time in lower spec machines. It seems like they decided that the speed savings wasn't worth the disk space tradeoff instead.
[Join us on Discord] and/or [Support us on Patreon]
Got a burning question you want answered?
Short questions: Ask a Game Dev on Twitter
Short questions: Ask a Game Dev on BlueSky
Long questions: Ask a Game Dev on Tumblr
Frequent Questions: The FAQ











