test test

stuff.

humph.

]]>I was working on a project recently and ran into an issue with memory that took me a few days to figure out. This is the sort of code that Id written:

`#include <iostream>#include <vector>struct SceneObject{ std::vector<int> entities;} aScene;void createEntity() { aScene.entities.push_back(5);}int main() { createEntity(); int* heldPointer = aScene.entities[0]; std::cout << "held pointer: " << heldPointer << "\t m_id" << *heldPointer << "\n"; std::cout << "good pointer: " << &aScene.entities[0] << "\n"; createEntity(); std::cout << "held pointer: " << heldPointer <<"\t m_id" << *heldPointer << "\n"; std::cout << "good pointer: " << &aScene.entities[0] << "\n"; return 0;}`

This is a simplified example of the actual code. The actual code had a vector of entities and a `std::map`

of entity ids to entity pointers. I hadnt used `std::map`

before so for a long time I assumed the bug was in how I was using `std::map`

. I went through a long series of different possible explanations for why things were happening.

The behavior was pretty odd, the entities were correct when inserted. Then the `.back()`

of the `std::map`

was always correctly pointing to the newest entity. In fact, when I had 100 elements in the `std::map`

the last 6 or so were actually correct. This was pretty baffling, if something was causing the data to be erased, why would it only erase 94% of the data?

I thought the issue was that I was misunderstanding the object lifetime, I thought, well I was instantiating the object inside a function so maybe it is deleted when I leave the function. Or it deletes it the next time the function is called (that explanation stood for awhile because thats exactly the behavior that was showing up, calling `createEntity()`

was what was deleting the data, it explained what was happening pretty well).

The actual issue was that `std::vector`

grows as you add elements to it, and each time it grows it allocates new memory and moves all the old data into the new memory. You can see this in the example code above, because `&aScene.entities[0]`

changes as elements are added (depending on your compiler you might have to call `createEntity()`

a few times). This means that pointers to the old data will be invalidated when the vector grows. So the solution here for my simple project would be to just use a non-dynamic array, just allocate 100 elements in the first place and then pointers to the data wont have to be updated Or have a system for updating the pointers each time the array grows.

If you hand someone a hand of cards and ask them to sort them, theyll usually go about it this way:

This is the method often use by people to sort bridge hands: consider the elements one at a time, inserting each in its proper place among those already considered (keeping them sorted). The element being considered is inserted merely by moving larger elements one position to the right, then inserting the element into the vacated position.

Robert Sedgewick, Algorithms

In other words, you hold your cards in your right hand, then start taking them, one by one, over to your left hand. As you move them, you place them in ascending order.

This is similar to **Insertion Sort** and also to **Selection Sort**. These two are the natural, intuitive sorting algorithms to humans.

Insertion Sort is what we call an **algorithm**. Its a general plan we can use to solve a problem: weve got some stuff in a jumbled up order, and we need to get them into ascending order. Algorithms are really useful. Sorting algorithms in particular are a good starting point for learning about algorithms.

Imagine you have a deck of cards, each card simply has a number on it, numbered from 1 to 10.

To understand sorting we need to play a game with these cards. You take turns in this game, and each turn you can do one of three things:

Pop: Draw a card from the top of the deck

Place a popped card onto your pile of sorted cards or back onto the deck

Search: Or draw a number of cards from either the deck or sorted pile to find a particular card youre looking for. (These must be put back in the same order). While searching you can pop any one card you find.

Once youve popped a card you cant pop another card until you place that card somewhere.

The goal of the game is to get the cards into ascending order on the sorted pile.

To play this game we have to remember were pretending to be a computer, so we need to limit ourselves to acting like they do.

**Insertion Sort** is a simple plan for how to do that. To start with just pop a card and place that onto the sorted pile. From there you do the same thing each turn:

Pop a card from the deck, search the sorted pile to find the right spot for it and put it there.

**Selection Sort** is basically the opposite plan. Pop one card onto the sorted pile to start, then each turn youll search the deck to find the lowest number and pop that then place that on top of the sorted pile.

So Insertion Sort searches the sorted pile each turn, and selection sort searches the deck each turn. You can see that they are basically two sides of the same coin.

Unfortunately, there is another rule: each time you touch any card you incur 1 Cost Point. The goal of the game is actually to incur as little cost as possible. You can probably tell that both of these algorithms are very expensive: with Selection Sort youre literally touching every card in the deck every turn. Thats something like 50 Cost Points right there for just 10 cards!

Now imagine if you had 100 cards numbered from 1 to 100.

*That* would take forever to sort through. Whereas 10 cards got about 50 cost points for selection sort, with 100 cards youre over 5,000 cost points!

As the amount of stuff youre sorting grows it takes longer and longer to sort it.

Some algorithms become really slow, and others barely slow down at all, or even remain at the same speed regardless of how much stuff theyre doing.

We usually refer to how scalable something is by using something called **Big O Notation**, but that has a few terms, so rather than jumping into the deep end and learning small nuanced differences between terms, Id just like you to learn the idea of **Linear Time**, and then understand that other times are either faster or slower than linear time.

When we say something has Linear Time we mean that as the amount of stuff its dealing with grows the amount of time the algorithm takes grows proportionally. So if it takes 10 seconds to sort 10 things then it will take 100 seconds to sort 100 things. Thats Linear Time: a 10x increase in the deck size would be a 10x increase in the cost points.

If we look at Selection Sort, it was at ~50 cost points with 10 cards, then at ~5,000 at 100 cards. So a 10x in the deck size actually multiplied the cost by 100x! Thats much slower than Linear Time.

Most of the time you want something to have Linear Time or faster-than-Linear Time.

But, depending on the problem, that might not be possible. Sorting is usually a little bit slower than Linear Time, for example.

A fast sorting algorithm might instead increase by something like 20x if you increase the size of the deck by 10x for example. Thats not linearlinear would be 10x for 10xbut its not so much slower, not nearly the 100x we see with Selection Sort.

Its important to focus on connecting these ideas to the reasons why they matter. Generally speaking, if youre talking about a web startup the kindof scalability theyre looking for is to grow their userbase, and so naturally as more people use their website itll put more strain on the site. But how much more strain?

If the algorithm grows at much more than a linear rate the site might well crash, or be so slow that people no longer want to use the site.

]]>