Saving a Binary Search Tree to a File

September 28, 2010 in binary tree

Describe an algorithm to save a Binary Search Tree (BST) to a file in terms of run-time and disk space complexity. You must be able to restore to the exact original BST using the saved format.


Hint:
Remember the big three tree traversal algorithm? Yes, it’s pre-order, in-order, and post-order traversal. Only one of them is useful for storing a BST.

Assume we have the following BST:

    _30_ 
   /    \    
  20    40
 /     /  \
10    35  50

In-order traversal
If we do an in-order traversal, we will output 10 20 30 35 40 50. There is no way of telling how the original BST structure looks like, as the following unbalanced BST is one of the possible BST sets from the output 10 20 30 35 40 50.

          _50 
         /       
        40 
       /    
      35
     /
    30
   /
  20
 /
10

Post-order traversal:
If we do a post-order traversal, we will output the leaf nodes before its parent node. Specifically, we will output 10 20 35 50 40 30 using post-order traversal. Imagine we’re reading the nodes to construct the tree back. See the problem that we are reading the leaf nodes before its parent? There’s too much trouble to create the child nodes before its parent. Remember that the BST insertion algorithm works by traversing the tree from the root to the correct leaf to insert.

Pre-order traversal:
Pre-order traversal is the perfect algorithm for making a copy of a BST. The output of a pre-order traversal of the BST above is 30 20 10 40 35 50. Please note the following important observation:

A node’s parent is always output before itself.

Therefore, when we read the BST back from the file, we are always able to create the parent node before creating its child nodes. The code for writing a BST to a file is exactly the same as pre-order traversal.

A Binary Search Tree (BST) is useful for storing phone book records in a memory limited device, such as a cell phone. The records are always maintained in sorted order, inserting and deleting a record takes O(lg n) time (slower than linked list, but much better than array).

Reading a BST from a file:
The question now is, how would you reconstruct a BST back from the file? Assume that we have the following pre-order traversal output of the BST earlier: 30 20 10 40 35 50.

Hint:
When we encounter a new node to insert, we should always place it into the first empty space which it will fit using a pre-order traversal. How do we determine the first empty space which a node will fit in? We can use the ordinary BST insertion algorithm, but the run-time complexity will be O(n lg n), since inserting a node takes O(lg n) time. Not efficient enough!

Solution:
Remember my post: Determine if a Binary Tree is a Binary Search Tree (BST)?

We use the same idea here. We pass the valid range of the values from the parent node to its child nodes. When we are about to insert a node, we will check if the insert value is in the valid range. If it is, this is the right space to insert. If it is not, we will try the next empty space. Reconstructing the whole BST from a file will take only O(n) time.

Further thoughts:
How about storing a binary tree (not BST) to a file? Does the algorithm above works? Why and why not?

VN:F [1.9.22_1171]
Rating: 4.9/5 (37 votes cast)
Saving a Binary Search Tree to a File, 4.9 out of 5 based on 37 ratings

27 responses to Saving a Binary Search Tree to a File

  1. I don't understand how this is O(n). You are anyhow doing log(n) comparisons for each element.
    Work-flow of above code:
    You have reached the recursive stack where you have inserted 30, 20, 10 and you are in 10s stack and you read 40.
    Then you will compare and unwind 10 and 20 and then insert 40 right of 30 after check at 30.
    So to insert 40 you took 3 comparisons.

    I must have missed something out about the O(n)

    VA:F [1.9.22_1171]
    0
    • if there are only 30, 20, 10, 40. it seems 40 is inserted as 10′s right child, and 40 is also inserted as 20 and 30′s right child. maybe I am wrong

      VA:F [1.9.22_1171]
      -2
  2. I know your concert that it takes 3 comparisons to insert 40 right of 30, which coincidentally is the same as the height of the tree. However, this is a special case and cannot happen to every node.

    To convince you that it is O(n), compare to that of an pre-order traversal. (You can prove that pre-order traversal is in O(n) by the way.) Our algorithm above is exactly the same as pre-order traversal, except we do a comparison at each node, which is O(1).

    Therefore, the complexity must be the same as pre-order traversal.

    VA:F [1.9.22_1171]
    +1
  3. I find it a bit tricky to transfer your O(n) BST rebuild code into Java. So here it is in case someone interested in.

    public static BST readBST(String fileName) throws Exception {
    BufferedReader fin = new BufferedReader(new FileReader(fileName));

    data = Integer.parseInt(fin.readline());
    bst = new BST();
    bst.root = new Node(-1, null, null);

    buildBST(bst.root, new int[] {data,}, Integer.MIN_VALUE, Integer.MAX_VALUE, fin);
    }

    private void buildBST(Node node, int[] data, int low, int high, BufferedReader fin) {
    if (data[0] > low && data[0] < high) {
    node.data = data[0];

    try {
    data[0] = Integer.parseInt(fin.readline());
    if (data[0] > low && data[0] < node.data) {
    node.left = new Node(-1, null, null);
    buildBST(node.left, data, low, node.data, fin);
    }
    if (data[0] > node.data && data[0] < high) {
    node.right = new Node(-1, null, null);
    buildBST(node.right, data, node.data, high, fin);
    }
    } catch (Exception e) {}
    }
    }

    VA:F [1.9.22_1171]
    +2
    • Nice use of int[] to replace int&. I think your code actually works.

      VA:F [1.9.22_1171]
      0
      • I think its almost like using a global variable , we can use any static variable instead of using array[0] .
        Will return code though.

        VA:F [1.9.22_1171]
        +1
      • slight modification is needed in case of BufferedReader while reading the next token
        String line = this.in.readLine();
        if (line == null) return;

        VN:F [1.9.22_1171]
        0
  4. Hi,

    I see that you have posted separate methods for serializing, deserializing BST and BT. But I think if you use both Inorder and preorder traversals on BST or BT you will be able to reconstruct the trees. This method is not different for BT and BST. All you have to do is take in inorder and preorder arrays and construct the BST or BT back. Please let me know if you think I am wrong.

    Thanks
    Shirpaa

    VA:F [1.9.22_1171]
    0
  5. @Shipra: BST has an edge over BT
    -> BST has an order property along with structure property(this is only thing BT has).
    so we don’t need two traversals for BST, unlike BT(where it can’t be uniquely determined otherwise)

    VA:F [1.9.22_1171]
    0
  6. This may be better:

    void readBSTHelper(int insertVal, BinaryTree *&p, ifstream &fin) {
    int val = insertVal;
    p = new BinaryTree(val);
    if (fin >> insertVal){
    if (insertVal left, fin);
    else readBSTHelper(insertVal, p->right, fin);
    }

    void readBST(BinaryTree *&root, ifstream &fin) {
    int val;
    if (fin >> val)
    readBSTHelper(val, root, fin);
    }

    VA:F [1.9.22_1171]
    0
  7. What if the tree contains INT_MIN or INT_MAX? A slight bug in the original post?
    I haven’t checked but this could be a problem that needs special handling in all posts that pass down min/max.

    VA:F [1.9.22_1171]
    0
  8. HX said on June 10, 2011

    A not smart way to solve this problem is as following. In this program, we do not transfer a Node by reference, so it is longer but it is a map of my mind into a program.

    Idea: For current node, we look at the next one. If the next one is within (min, current_value), we parse it as left subtree. If the next one is within (current_value, max), we parse it as right subtree. The parameter “len” is used to transfer the length that a subtree back to the current node.

    Copy it to the “Show i has 1337 code coding panel” at the bottom of you browser window, and it runs. Hope it helps some of you.

    // Type your C++ code and click the “Run Code” button!
    // Your code output will be shown on the left.
    // Click on the “Show input” button to start entering input data.
    #include
    using namespace std;

    class Node {
    public:
    Node(int a) { val = a; left = NULL; right = NULL; }

    Node * left;
    Node * right;
    int val;
    };

    Node * func_recon(int a[], int start, int & len, int min, int max)
    {
    int curr = a[start];

    if (start + 1 == 8) // last one
    {
    return new Node(curr);
    }

    Node * left = NULL;
    Node * right = NULL;
    int left_len = 0;
    int right_len = 0;

    int next_val = a[start + 1];
    if (next_val > min && next_val curr && next_val left = left;
    n->right = right;

    return n;
    }

    void preorder(Node * n)
    {
    if (!n) return;

    std::cout <val <left);
    preorder(n->right);
    }

    void inorder(Node * n)
    {
    if (!n) return;
    inorder(n->left);
    std::cout <val <right);
    }

    int main() {
    // Start typing your code here…

    int a [] = { 5, 2, 1, 3, 4, 8, 6, 7 };
    int len = 0;
    Node * root = func_recon(a, 0, len, -100, 100);

    preorder(root);
    std::cout << std::endl;
    inorder(root);

    return 0;
    }

    VA:F [1.9.22_1171]
    0
  9. hi ,

    i think there should be one return statement once this statement is false
    if (insertVal > min && insertVal < max) .

    VA:F [1.9.22_1171]
    0
  10. I feel this function is not correct. If the input is 30,10,50, it will build root with 30 and left child with 10, but 50 will not be added as the right child. Because when adding the children of 30, 50 will be read in.

    VA:F [1.9.22_1171]
    +2
    • I got the same question at first glance. But after a second thought, it turns out to be correct because “interval” is passed as a reference instead of a value in function readBSTHelper().

      VA:F [1.9.22_1171]
      0
  11. By using master theorem we can say that complexity is O(n)

    T(n) = 2*T(n/2) + O(1)

    T(n) will be O(n)

    VA:F [1.9.22_1171]
    0
  12. for example the pre order is 30 40 20 and a tree will be fomred according to the code given but for that tree if we do a pre order traversal then it is 30 20 40..can any one explain me

    VA:F [1.9.22_1171]
    0
  13. It seems there is a bug in your solution.

    if (insertVal > min && insertVal min && insertVal <= max) {

    Consider the following case:

    _20_
    / \
    20

    VA:F [1.9.22_1171]
    +1
  14. It seems there is a bug in your solution.

    if (insertVal > min && insertVal min && insertVal <= max) {

    Consider the following case:

    _20_
    / \
    20

    VA:F [1.9.22_1171]
    0
  15. Well, some of comments are truncated. Let me try it one more time.

    We should use “insertVal <= max".

    Consider the following case:

    _20_
    / \
    20

    VA:F [1.9.22_1171]
    +1
  16. For java:

    VA:F [1.9.22_1171]
    +2
  17. Let’s make it in C code.

    int a[size];
    int last = 0;

    node *new_node(int val);

    node *build(int min, int max)
    {
    node *p = NULL;
    int val = a[last];
    if (min < val && val left = build(min, val);
    p->right = build(val, max);
    }
    return p;
    }

    node *makebst()
    {
    return build(INT_MIN, INT_MAX);
    }

    VA:F [1.9.22_1171]
    0
  18. Is it possible to use only level order for constructing the bst.

    VA:F [1.9.22_1171]
    0
  19. I tried to modify it to problem as “convert pre-order list to BST” in java and it just doesn’t work..
    here’s the code. I think it’s because I changed the single input stream value to an array and the recursion gets mixed up with the index part.

    public static TreeNode readBSTHelper(int min, int max, int[] fin, int index){
    System.out.println(index + ” ” + min+ ” ” + max);
    if(index == fin.length)
    return null;
    int val = fin[index];
    System.out.println(“Current value is ” + val);
    if(val min){
    TreeNode node = new TreeNode(val);
    node.left = readBSTHelper(min, val, fin, index + 1);
    //System.out.println(min + ” ” + )
    node.right = readBSTHelper(val, max, fin, fin.length – index);
    return node;
    }
    else
    return null;
    }

    VA:F [1.9.22_1171]
    0
  20. What about duplicate values in a BST?

    VN:F [1.9.22_1171]
    -1
  21. how about level order?

    VA:F [1.9.22_1171]
    0

Leave a reply

Your email address will not be published. Required fields are marked *

You may use the <code> tag to embed your code.