## Saving a Binary Search Tree to a File

September 28, 2010 in binary tree

Describe an algorithm to save a Binary Search Tree (BST) to a file in terms of run-time and disk space complexity. You must be able to restore to the exact original BST using the saved format.

**Hint:**

Remember the **big three** tree traversal algorithm? Yes, it’s **pre-order**, **in-order**, and **post-order** traversal. Only one of them is useful for storing a BST.

Assume we have the following BST:

_30_ / \ 20 40 / / \ 10 35 50

**In-order traversal**

If we do an in-order traversal, we will output 10 20 30 35 40 50. There is no way of telling how the original BST structure looks like, as the following unbalanced BST is one of the possible BST sets from the output 10 20 30 35 40 50.

_50 / 40 / 35 / 30 / 20 / 10

**Post-order traversal:**

If we do a post-order traversal, we will output the leaf nodes before its parent node. Specifically, we will output 10 20 35 50 40 30 using post-order traversal. Imagine we’re reading the nodes to construct the tree back. See the problem that we are reading the leaf nodes before its parent? There’s too much trouble to create the child nodes before its parent. Remember that the BST insertion algorithm works by traversing the tree from the root to the correct leaf to insert.

**Pre-order traversal:**

Pre-order traversal is the *perfect *algorithm for making a copy of a BST. The output of a pre-order traversal of the BST above is 30 20 10 40 35 50. Please note the following important observation:

Therefore, when we read the BST back from the file, we are always able to create the parent node before creating its child nodes. The code for writing a BST to a file is exactly the same as pre-order traversal.

*O*(lg

*n*) time (slower than linked list, but much better than array).

**Reading a BST from a file:**

The question now is, how would you reconstruct a BST back from the file? Assume that we have the following pre-order traversal output of the BST earlier: 30 20 10 40 35 50.

**Hint:**

When we encounter a new node to insert, we should always place it into the first empty space which it will fit using a pre-order traversal. How do we determine the first empty space which a node will fit in? We can use the ordinary BST insertion algorithm, but the run-time complexity will be *O*(*n* lg *n*), since inserting a node takes *O*(lg *n*) time. Not efficient enough!

**Solution:**

Remember my post: Determine if a Binary Tree is a Binary Search Tree (BST)?

We use the same idea here. We pass the valid range of the values from the parent node to its child nodes. When we are about to insert a node, we will check if the insert value is in the valid range. If it is, this is the right space to insert. If it is not, we will try the next empty space. Reconstructing the whole BST from a file will take only *O*(*n*) time.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | void readBSTHelper(int min, int max, int &insertVal, BinaryTree *&p, ifstream &fin) { if (insertVal > min && insertVal < max) { int val = insertVal; p = new BinaryTree(val); if (fin >> insertVal) { readBSTHelper(min, val, insertVal, p->left, fin); readBSTHelper(val, max, insertVal, p->right, fin); } } } void readBST(BinaryTree *&root, ifstream &fin) { int val; fin >> val; readBSTHelper(INT_MIN, INT_MAX, val, root, fin); } |

**Further thoughts:**

How about storing a binary tree (not BST) to a file? Does the algorithm above works? Why and why not?

Anonymous said on November 7, 2010

I don't understand how this is O(n). You are anyhow doing log(n) comparisons for each element.

Work-flow of above code:

You have reached the recursive stack where you have inserted 30, 20, 10 and you are in 10s stack and you read 40.

Then you will compare and unwind 10 and 20 and then insert 40 right of 30 after check at 30.

So to insert 40 you took 3 comparisons.

I must have missed something out about the O(n)

0lee said on March 2, 2012

if there are only 30, 20, 10, 40. it seems 40 is inserted as 10′s right child, and 40 is also inserted as 20 and 30′s right child. maybe I am wrong

-2polosong1983 said on March 2, 2012

sorry, got a naive mistake, now clear….

01337c0d3r said on November 7, 2010

I know your concert that it takes 3 comparisons to insert 40 right of 30, which coincidentally is the same as the height of the tree. However, this is a special case and cannot happen to every node.

To convince you that it is O(n), compare to that of an pre-order traversal. (You can prove that pre-order traversal is in O(n) by the way.) Our algorithm above is exactly the same as pre-order traversal, except we do a comparison at each node, which is O(1).

Therefore, the complexity must be the same as pre-order traversal.

+1Anonymous said on December 2, 2010

I find it a bit tricky to transfer your O(n) BST rebuild code into Java. So here it is in case someone interested in.

public static BST readBST(String fileName) throws Exception {

BufferedReader fin = new BufferedReader(new FileReader(fileName));

data = Integer.parseInt(fin.readline());

bst = new BST();

bst.root = new Node(-1, null, null);

buildBST(bst.root, new int[] {data,}, Integer.MIN_VALUE, Integer.MAX_VALUE, fin);

}

private void buildBST(Node node, int[] data, int low, int high, BufferedReader fin) {

if (data[0] > low && data[0] < high) {

node.data = data[0];

try {

data[0] = Integer.parseInt(fin.readline());

if (data[0] > low && data[0] < node.data) {

node.left = new Node(-1, null, null);

buildBST(node.left, data, low, node.data, fin);

}

if (data[0] > node.data && data[0] < high) {

node.right = new Node(-1, null, null);

buildBST(node.right, data, node.data, high, fin);

}

} catch (Exception e) {}

}

}

+2Sean said on June 1, 2011

Nice use of int[] to replace int&. I think your code actually works.

0krishna said on July 12, 2011

I think its almost like using a global variable , we can use any static variable instead of using array[0] .

Will return code though.

+1shoubhik said on December 14, 2012

slight modification is needed in case of BufferedReader while reading the next token

String line = this.in.readLine();

if (line == null) return;

0Rya said on March 3, 2011

Hi,

I see that you have posted separate methods for serializing, deserializing BST and BT. But I think if you use both Inorder and preorder traversals on BST or BT you will be able to reconstruct the trees. This method is not different for BT and BST. All you have to do is take in inorder and preorder arrays and construct the BST or BT back. Please let me know if you think I am wrong.

Thanks

Shirpaa

0aimless said on May 6, 2011

@Shipra: BST has an edge over BT

-> BST has an order property along with structure property(this is only thing BT has).

so we don’t need two traversals for BST, unlike BT(where it can’t be uniquely determined otherwise)

0johnny said on May 17, 2011

This may be better:

void readBSTHelper(int insertVal, BinaryTree *&p, ifstream &fin) {

int val = insertVal;

p = new BinaryTree(val);

if (fin >> insertVal){

if (insertVal left, fin);

else readBSTHelper(insertVal, p->right, fin);

}

void readBST(BinaryTree *&root, ifstream &fin) {

int val;

if (fin >> val)

readBSTHelper(val, root, fin);

}

0Anonymous said on May 19, 2011

What if the tree contains INT_MIN or INT_MAX? A slight bug in the original post?

I haven’t checked but this could be a problem that needs special handling in all posts that pass down min/max.

0HX said on June 10, 2011

A not smart way to solve this problem is as following. In this program, we do not transfer a Node by reference, so it is longer but it is a map of my mind into a program.

Idea: For current node, we look at the next one. If the next one is within (min, current_value), we parse it as left subtree. If the next one is within (current_value, max), we parse it as right subtree. The parameter “len” is used to transfer the length that a subtree back to the current node.

Copy it to the “Show i has 1337 code coding panel” at the bottom of you browser window, and it runs. Hope it helps some of you.

// Type your C++ code and click the “Run Code” button!

// Your code output will be shown on the left.

// Click on the “Show input” button to start entering input data.

#include

using namespace std;

class Node {

public:

Node(int a) { val = a; left = NULL; right = NULL; }

Node * left;

Node * right;

int val;

};

Node * func_recon(int a[], int start, int & len, int min, int max)

{

int curr = a[start];

if (start + 1 == 8) // last one

{

return new Node(curr);

}

Node * left = NULL;

Node * right = NULL;

int left_len = 0;

int right_len = 0;

int next_val = a[start + 1];

if (next_val > min && next_val curr && next_val left = left;

n->right = right;

return n;

}

void preorder(Node * n)

{

if (!n) return;

std::cout <val <left);

preorder(n->right);

}

void inorder(Node * n)

{

if (!n) return;

inorder(n->left);

std::cout <val <right);

}

int main() {

// Start typing your code here…

int a [] = { 5, 2, 1, 3, 4, 8, 6, 7 };

int len = 0;

Node * root = func_recon(a, 0, len, -100, 100);

preorder(root);

std::cout << std::endl;

inorder(root);

return 0;

}

0Nishant said on October 8, 2011

hi ,

i think there should be one return statement once this statement is false

if (insertVal > min && insertVal < max) .

0Y. Zhang said on October 7, 2012

I feel this function is not correct. If the input is 30,10,50, it will build root with 30 and left child with 10, but 50 will not be added as the right child. Because when adding the children of 30, 50 will be read in.

+2Sarah said on December 23, 2013

I got the same question at first glance. But after a second thought, it turns out to be correct because “interval” is passed as a reference instead of a value in function readBSTHelper().

0Alpha said on January 17, 2013

By using master theorem we can say that complexity is O(n)

T(n) = 2*T(n/2) + O(1)

T(n) will be O(n)

0sri said on January 28, 2013

for example the pre order is 30 40 20 and a tree will be fomred according to the code given but for that tree if we do a pre order traversal then it is 30 20 40..can any one explain me

0Bing said on March 31, 2013

It seems there is a bug in your solution.

if (insertVal > min && insertVal min && insertVal <= max) {

Consider the following case:

_20_

/ \

20

+1Bing said on March 31, 2013

It seems there is a bug in your solution.

if (insertVal > min && insertVal min && insertVal <= max) {

Consider the following case:

_20_

/ \

20

0Bing said on March 31, 2013

Well, some of comments are truncated. Let me try it one more time.

We should use “insertVal <= max".

Consider the following case:

_20_

/ \

20

+1Alexander said on August 19, 2013

For java:

+2Neko said on September 14, 2013

Let’s make it in C code.

int a[size];

int last = 0;

node *new_node(int val);

node *build(int min, int max)

{

node *p = NULL;

int val = a[last];

if (min < val && val left = build(min, val);

p->right = build(val, max);

}

return p;

}

node *makebst()

{

return build(INT_MIN, INT_MAX);

}

0ankit said on October 2, 2013

Is it possible to use only level order for constructing the bst.

0Judy said on October 11, 2013

I tried to modify it to problem as “convert pre-order list to BST” in java and it just doesn’t work..

here’s the code. I think it’s because I changed the single input stream value to an array and the recursion gets mixed up with the index part.

public static TreeNode readBSTHelper(int min, int max, int[] fin, int index){

System.out.println(index + ” ” + min+ ” ” + max);

if(index == fin.length)

return null;

int val = fin[index];

System.out.println(“Current value is ” + val);

if(val min){

TreeNode node = new TreeNode(val);

node.left = readBSTHelper(min, val, fin, index + 1);

//System.out.println(min + ” ” + )

node.right = readBSTHelper(val, max, fin, fin.length – index);

return node;

}

else

return null;

}

0Serenitypo said on October 29, 2013

What about duplicate values in a BST?

-1niuniu said on March 19, 2014

how about level order?

0