Algorithm Scalability

There’s a useful concept in computers called Linear Time, which helps us understand why some programs slow down under heavy loads.

If you hand someone a hand of cards and ask them to sort them, they’ll usually go about it this way:

This is the method often use by people to sort bridge hands: consider the elements one at a time, inserting each in its proper place among those already considered (keeping them sorted). The element being considered is inserted merely by moving larger elements one position to the right, then inserting the element into the vacated position.
 — Robert Sedgewick, Algorithms

In other words, you hold your cards in your right hand, then start taking them, one by one, over to your left hand. As you move them, you place them in ascending order.

This is similar to Insertion Sort and also to Selection Sort. These two are the natural, intuitive sorting algorithms to humans.

Insertion Sort is what we call an algorithm. It’s a general plan we can use to solve a problem: we’ve got some stuff in a jumbled up order, and we need to get them into ascending order. Algorithms are really useful. Sorting algorithms in particular are a good starting point for learning about algorithms.

Insertion Sort & Selection Sort

Imagine you have a deck of cards, each card simply has a number on it, numbered from 1 to 10.

To understand sorting we need to play a game with these cards. You take turns in this game, and each turn you can do one of three things:

  • Pop: Draw a card from the top of the deck

  • Place a popped card onto your pile of sorted cards or back onto the deck

  • Search: Or draw a number of cards from either the deck or sorted pile to find a particular card you’re looking for. (These must be put back in the same order). While searching you can pop any one card you find.

Once you’ve popped a card you can’t pop another card until you place that card somewhere.

The goal of the game is to get the cards into ascending order on the sorted pile.

To play this game we have to remember we’re pretending to be a computer, so we need to limit ourselves to acting like they do.

Insertion Sort is a simple plan for how to do that. To start with just pop a card and place that onto the sorted pile. From there you do the same thing each turn:

Pop a card from the deck, search the sorted pile to find the right spot for it and put it there.

Selection Sort is basically the opposite plan. Pop one card onto the sorted pile to start, then each turn you’ll search the deck to find the lowest number and pop that then place that on top of the sorted pile.

So Insertion Sort searches the sorted pile each turn, and selection sort searches the deck each turn. You can see that they are basically two sides of the same coin.

Unfortunately, there is another rule: each time you touch any card you incur 1 Cost Point. The goal of the game is actually to incur as little cost as possible. You can probably tell that both of these algorithms are very expensive: with Selection Sort you’re literally touching every card in the deck every turn. That’s something like 50 Cost Points right there for just 10 cards!

Scalability

Now imagine if you had 100 cards numbered from 1 to 100.

That would take forever to sort through. Whereas 10 cards got about 50 cost points for selection sort, with 100 cards you’re over 5,000 cost points!

As the amount of stuff you’re sorting grows it takes longer and longer to sort it.

Some algorithms become really slow, and others barely slow down at all, or even remain at the same speed regardless of how much stuff they’re doing.

We usually refer to how scalable something is by using something called Big O Notation, but that has a few terms, so rather than jumping into the deep end and learning small nuanced differences between terms, I’d just like you to learn the idea of Linear Time, and then understand that other times are either faster or slower than linear time.

When we say something has Linear Time we mean that as the amount of stuff it’s dealing with grows the amount of time the algorithm takes grows proportionally. So if it takes 10 seconds to sort 10 things then it will take 100 seconds to sort 100 things. That’s Linear Time: a 10x increase in the deck size would be a 10x increase in the cost points.

If we look at Selection Sort, it was at ~50 cost points with 10 cards, then at ~5,000 at 100 cards. So a 10x in the deck size actually multiplied the cost by 100x! That’s much slower than Linear Time.

Most of the time you want something to have Linear Time or faster-than-Linear Time.

But, depending on the problem, that might not be possible. Sorting is usually a little bit slower than Linear Time, for example.

A fast sorting algorithm might instead increase by something like 20x if you increase the size of the deck by 10x for example. That’s not linear — linear would be 10x for 10x — but it’s not so much slower, not nearly the 100x we see with Selection Sort.

It’s important to focus on connecting these ideas to the reasons why they matter. Generally speaking, if you’re talking about a web startup the kindof scalability they’re looking for is to grow their userbase, and so naturally as more people use their website it’ll put more strain on the site. But how much more strain?

If the algorithm grows at much more than a linear rate the site might well crash, or be so slow that people no longer want to use the site.