Bottlenecks - disk, cpu, synchronization, etc

Bottlenecks are the parts of the system that limit performance. The system is made up of multiple parts that can run in parallel, for example the disk can be doing disk IO at the same time as the CPU is running some code, but when the system is at its maximum throughput then one of those parts is running at 100% of it's maximum capacity, and that part is the bottleneck, everything else is running at less then 100% because its spending some time waiting for the bottleneck. Here's a list of possible bottlenecks, ordered roughly in order of how likely they are to be the bottleneck in a typical business application:

  1. disk IO
  2. CPU
  3. synchronization of a shared software resource
  4. an external system your software interacts with
  5. network IO
  6. virtual memory
  7. main memory / RAM
  8. some of the users of the system

Usually the bottleneck dominates the use of time in the system, and hence usually performance can only be significantly improved by addressing the bottleneck (this is the down to earth meaning of Amdahl's Law). So it's key to find the bottleneck if you want to improve performance. This is the reason for performance rule 1: measure first optimize second. By measuring where the time is being spent you can find the bottleneck and optimize that, rather than waste time optimizing something that isn't the bottleneck.

To improve performance you can either make the bottlenecked part itself go faster, or you can use the bottlenecked part more efficiently by getting the system to do more useful work for the same load on the bottleneck. For example if disk IO is the bottleneck, you can either get faster disks or you can change your code to use disk IO more efficiently. If hardware is the bottleneck then rarely will changing the hardware lead to very big performance increases, and often times changing the hardware simply isn't possible. But the good news is that the very large performance increases usually come from changing code to use the hardware more efficiently, because the performance problems in most systems are due to violations of the performance rules, not hardware that is too slow.

MarkJ's Rules for Good Software Performance

To get good performance from your software applications your team needs to follow MarkJ's Rules of Good Software Performance: 1) Measure first and optimize second - find bottlenecks , ie don't guess where a performance problem might be before you have one, find the slow part of the system through testing. In performance engineering it's time you have to follow - find out where the time goes and make that part faster. The slowest part of the system is it's bottleneck, the part that maxes out first and limits performance of the entire system. Making another part of the system faster but leaving the bottleneck alone isn't going to make much difference. Steve Sounders gives an excellent example of this in 'High Performance Web Sites: Essential Knowledge for Front-End Engineers', explaining how Yahoo optimized page response time. They found that 10% of the page load time came from Yahoo's back end servers, and 90% came from page download and rendering time on the browser. To improve page load time there was little point trying to optimize the back end server code any more because it only accounted for 10% of the time, instead they spent their time finding ways to optimize page download and rendering time because that's where 90% of the time was going.

2) Use I/O efficiently - it's much slower than code. Inefficient use of I/O is the underlying cause for 50-75% of all performance problems in typical business software, web apps, and most software I've come across. The data the application uses is in some place that has to be accessed via disk I/O or network I/O. I/O to disk is really really slow. I/O to the network is quite slow, and even accessing data in RAM is slower than using data in the CPU and cache. Obviously you can't just skip the I/O altogether, you have to access the data in an efficient way. This comes down to designing the software so that it gets a large amount of data with a small number of I/O calls, rather a large number of I/O calls each getting a small amount of data. The classic example, which is seen in software over and over again, is SQL select calls to the database inside a loop. Ie code inside a loop keeps doing another I/O to get another row from a table. This slow. The correct way to code this is to select all the data needed in one go, then step through a single large result set inside the loop.

3) Use shared resources sparingly - or don't share them. Shared resources usually have to be protected against concurrent use by multiple threads by synchronization locks, but the synchronization makes threads wait to use the resource. Synchronization is the underlying cause for the other 25-50% of performance problems, and concurrency contention is at the heart of scalability. If software has high concurrency contention, aka poor concurrency, then many users requests, many threads, many CPUs, etc will all be waiting to get a lock on a shared resource before they can execute. All that waiting adds up to poor performance. Software that has low concurrency contention, aka good concurrency, allows lots of different things to be going on at once in parallel without all the tasks waiting for each other. Synchronization brings thread safety to shared data, but it also increases concurrency contention.

4) Learn how you are supposed to use APIs and third party components. It seems obvious, but this is the cause of many performance problems. When you use something in a way in that its original designers did not intend, then you often end up tripping on rules 2, 3, or both. (Where this something might be a third party API, a piece of code written by your colleague, or your database.) Programmers are very creative people. A programmers natural tendency when using an API is first to get a basic understanding what it does, and then to dream up no end of 'clever' new ways to use it. Unfortunately if you don't understand exactly how the insides of this component work then once you stray away from the intended uses you end up causing the component to use inefficient patterns of access to data and/or have poor concurrency. A classic example in business software that uses Oracle is using dynamic SQL instead of prepared statements. Oracle is designed so that frequently used queries can execute with good concurrency if they are issued using prepared statements. The part of Oracle that makes it have great performance with prepared statements means that if you instead perform every query as a brand new piece of SQL, then Oracle has high concurrency contention in the statement cache (rule 3) and performance is terrible.

5) Use a realistic amount of test data and a realistic workload - don't test against and empty database. It's a classic problem that occurs again and again. The code works fine on the developer's PC where his database contains just a few hundred rows of test data, but when the system goes into production with millions of rows in the database performance is a disaster. Performance engineering is just too hard to get right without good testing. There will always be something you overlooked or underestimated, or some place where you broke rules 2, 3, or 4 in your code that you need to uncover. But if you test with a practically empty database then you'll fail to uncover these surprises because guess what - your application will go really fast when the database is empty. Further, what you really have to do is some basic capacity planning (rule 6) to forecast what the workload will be, how much data there will be in the database, and how many concurrent users you'll have, and design a test that simulates all of that.

6) Do performance engineering work throughout your project. Doing this means you can have visibility to potential performance problems as you go, while you have a chance to redesign the problem away. If you leave performance testing to the end of the project you'll discover multiple cases of breaking rules 2, 3, and 4 that will require redesign, recoding, and retesting - which will take you over budget and you'll miss your delivery date. If you are introducing a new technology, design, or algorithm its a good idea to do a technical prototype early on and performance test it. Also you need to start with some simple capacity planning: How many users? How much data? What performance characteristics do you need? What kind of hardware can you run it on? Without having a rough answer to these questions you won't know if your performance is good enough or not. If you miss you'll either have a system that is too slow or you'll have wasted money building something that is faster than needed.

7) Don't optimize coding while coding. It's a direct violation of rule 1, but I'm restating it as the last rule because programmers often forget this and waste their time trying to 'optimize' the code they are writing as they go (me included). It's a waste because there's no point trying to optimize code until you know where the bottlenecks are, and when programmers 'optimize' code, we usually make that code harder to understand, harder to maintain, introduce additional bugs. This doesn't mean you shouldn't make good choices concerning performance as you go - you should pay attention to patterns of access to data and concurrency contention. However it's almost never necessary to 'optimize code', code is nearly always fast enough.

Scalability and Other Meanings of Performance

In Throughput and Response Time I described the essence of what we mean by software performance, but there are some additional meanings that are important to consider in many cases. So here is a bigger list of what we mean when we talk about performance:

  • Response Time
  • Throughput
  • Scalability
  • Stability
  • Resource Utilization, eg memory footprint
  • Overload or failure characteristics
  • Capacity planning

So lets take these one by one, starting with Scalability. Scalability is how well performance of the system can be improved by adding more hardware. Every software system has limitations to performance, and the underlying cause is often a limit to how fast part of the hardware can go. If the software has good scalability then you can add more of whatever is maxed out and get better performance. Often times we think about scalability simply as the ability to add more servers. For example, Google's system is highly scalable, they have thousands of servers running all their apps, and as usage grows they just keep adding more servers to keep up with demand. They 'scale up'. Scalability doesn't have to be only about scaling across separate servers, we may want run some software on one individual server, but care if it is scalable as we add more CPUs or more disks. Relational databases are the classic example of this, its hard to run a cluster of database servers operating as one, so when we need better performance from a database its cheaper to upgrade to a more powerful server with more CPUs, RAM, and disks. Good scalability often comes with a small compromise in response time or throughput for one individual server. Ie, designing your software without caring about scalability might mean its performance is better then the same application designed with high scalability when both versions are running on just one server. But the scalable design can be scaled up to support more users, more work, or respond more quickly by adding more hardware, where the unscalable design will be stuck with the performance of a single server.

Stability is concerned with how the software stands up to heavy use by lots of users doing lots of work for a long period of time. Stable software crashes rarely and has few errors, unstable software crashes a lot and has problems completing work correctly. Stability is tied up with other performance topics because many stability problems only show up when performance testing with lots of users, and a high load on the system, so whoever is doing performance testing work usually has to also work on the stability of the system.

When looking at resource utilization, we care about how the software uses CPU, RAM, disk IO, disk space, and network capacity. All of these hardware resources are limited, and its desirable for our software to use these resources carefully. Under high load resource utilization will be higher than testing just one user with one use case, so just like stability, studying resource utilization is in the domain of performance testing.

Every software application has a limit to how fast it can go, how many users it can support, etc. When the system is overloaded by going beyond these limits then the software may fail by crashing, causing data loss, incorrectly completing work, dropping users, etc. These are the failure modes or overload characteristics of the application. The most desirable failure mode is that work backlogs and response time goes down, while throughput stays close to 100% of its max. The least desirable failure modes are when the system totally locks up during an overload, and throughput drops to 0. Understanding the overload characteristics are important for capacity planning.

Capacity planning is preparing for the demands on the software system in the future by understanding the performance characteristics of the application, predicting how many users and what kind of workload that application will have, and then setting up the right hardware to be able to run the software with the needed performance, without spending more money on hardware then you need to.

Throughput and response time

What does 'performance' mean? Software performance is mostly about how fast the software goes and how much work it can handle. To discuss performance we need more accurate terms than ‘speed’. The two most important concepts are:

  • Response time - how quickly the system responds to request
  • Throughput - how much work can it do in some period of time

Response time is a measure of how quickly the system responds to a request for it to do something. Put another way, response time is how long it takes before it finishes what you told it to do. Response time is really important for any software that has real people sitting there interacting with it, eg a web application like Ebay. Users want the next page to come back quickly when they click on stuff. Response time for interactive software is often measured in seconds.

Throughput is a measure of how much work the system can do in a given period of time - eg if in one minute my mp3 software can encode 6 songs from a CD, then its throughput is 6 songs per minute. Throughput is really important to software that has to process a lot of data in some way, eg the electric company generating all the electricity bills for its millions of customers. Throughput is also critical for interactive software that has many users, eg how many web pages can Ebay serve up every second.

Response time and throughput are usually not independent of each other. If you are developing software you often have to trade one for the other in your architecture choices. Often times you can make a choice that will increase throughput by a large %, but it makes your response times increase a little (typically when you use request queues in your system). In that case, should you say that the software is faster because it does more work, or slower because it doesn't react as fast? Both are true, so when you talk about performance it is confusing to talk about 'speed', you have to talk about response time and/or throughput.

Another side of throughput vs response time is who cares. If you are using to buy a stuffed cat, you don't care how many millions of pages ebay severed up while you were on the site (throughput), you only care about how fast your pages came back (response time). On the other hand, ebay management really care if their system can generate 1,000 or 10,000 page views a second (throughput), because they have tons of stuff they want everyone to keep everyone bidding on and they want to use the fewest servers possible to keep costs under control.