Quote:
|
So, in school, you're generally taught about time and space complexity of algorithms, but those are not the only aspects that determine a data structure's or algorithm's runtime speed. Another huge part of those determinations is Locality of Reference. That is to say, contiguous data is best data. So the "classic" definition of a binary tree involves something like this: struct Node { MyData data; Node * left; Node * right; }; And whenever you need a new child, you do something like: node->left = new Node; But, barring overloading operator new(), these allocations are fragmenting your data. If you're traversing through a tree, you may repeatedly cache miss as you go from child to child, heavily impacting your runtime. One alternative is to use an array to hold your binary tree. The parent is index 0, and the children of any index follow this pattern: left = 2n + 1 right = 2n + 2 As a natural consequence of this, a breadth-first search is just an iteration from beginning to end of the entire array. That's pretty neat. But do you want to do that? Well, what are we using this tree for? Is it used a lot, or a little? Is it a heap? Is it being used for breadth-first or depth-first searches? Are things being removed from it constantly? The algorithms are extremely important, but so are the underlying data structures. In some situations, you may want to resize this "array-based binary tree" just like a vector would when it hits its cap. But if "MyData" is large, maybe that's to large a cost. Maybe instead we implement blocks of memory like a deque, so that we never have to move things and we still get lots of locality of reference. But if we're doing depth-first searches, we could be jumping blocks all the time, which is bad... My lesson to you is this: programming is about using the right tool for the job, but the reason it's so hard is that there's a lot of stuff to consider. :P (Also you can pretty much always use vector because locality of reference is that powerful.) |
So what I presume you are saying is that fragmentation like this causes only part of the data structure to be cached since calling new doesn't care about locality essentially...which is why a vector is so powerful I take it. because when it is created it allocates memory in a way that it selects a RAM slice for the whole data structure like an array, but allows you to dynamically alter the size unlike an array...good information to know, and something I had never considered before.
So that being said I feel it safe to assume that a BST would only really be useful in a data structure so large that locality of reference isn't possible? Say a data structure that holds 50,000 references to a class that stores an int and a char array of size 20? or would 1,200,000 bytes of ram (around 1171 KB) still be small enough to share the same slice of RAM efficiently enough to justify using vector



