B+ tree에 대해 몰랐던 진실

Tags:

Suppose that we have a sorted file and want to construct a dense primary B+ tree index on this file. One way to accomplish this task is to scan the file, record by record, inserting each one using B+ tree insertion procedure.

I. What performance and storage utilization problems are there with this approach?

II. Devise a scheme to overcome those problems.

답)

I. What performance and storage utilization problems are there with this approach?
Inserting an entry for each data record, one at a time, using standard insertion algorithm is likely to be quite expensive, since each entry requires us to start from the root and go down to the appropriate leaf page. Even though the index level pages are likely to stay in the buffer pool between successive requests, the overhead is still considerable. Also, according to the insertion algorithm, each time a node splits, the data entries are redistributed evenly to both nodes. This leads to a fixed page utilization of 50%.

II. Devise a scheme to overcome those problems.
The bulk loading algorithm has good performance and space utilization compared with the repeated inserts approach. Since the B+ tree is grown from the bottom up, the bulk loading algorithm allows the administrator to pre-set the amount each index and data page should be filled. This allows good performance for future inserts, and supports some desired space utilization.

—————————————————————

bulk loading이라는건, 데이터를 미리 소트하고 bottom-up으로 트리를
빌드하겠다는 strategy..

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *