TinyURL

A tinyURL is an URL that is shorter than normal URL.
Say we have an url saying

If we map it through the tinyURL.com, it becomes

How should we design a system like this? We start from simple case.
If we have a very powerful machine, and unlimited amount of memory, the first thing we can come up with is using a hashMap, but that will require an O(n) space, where n is the number of URLs which can be a huge number.
Can we do better? Yes. Notice that in database, every record (long URL) has an id associated with it. If we can simply convert the id into the short url, we are done. But hash function is not necessarily Bijective function ( a function makes a one on one match between elements from set X to set Y).
Then the question comes to how can we match an integer (the id) to a k bit number and letters. The trick here is to consider the k bit number and letters as a base 62 number (0-0, 1-1, … 9-9 and 10-a, 11- b… 36 – A, 37 – B, … 62 – Z) Then the question is simplified to convert base 10 number to a base 62 number.

public class solution {
	public String convertToBase62(int input) {
		StringBuilder result = new StringBuilder();
		HashMap<Integer, Character> map = new HashMap<Integer, Character();
		for (int i = 0; i <= 62; i++) {
			//put matching here
		}
		while(input != 0) {
			int tmp = input % 62;
			result.append(map.get(tmp));
			input /= 62;
		}
		result.reverse();
		return new String(result);
	}
}

The id is generated when the url in O(1) time by a global counter, while the generation of the tiny url will only take O(k) time.

What if we the data we have is in multiple machines?
We can maintain a distributed database, which will take care of all the insert, delete and quire. Also it will come with all the good features that come with the database like automatic backup and redundancy. The draw back is the cost to maintain the database could be large.

Since we are only storing key-value pairs, we use the http://en.wikipedia.org/wiki/Distributed_data_store
Some of them use Consistent Hashing to implement. So we can simply calculate the hash value of the long URL to an integer, and use that integer to locate the server. Then we retrieve the id from the server and use the base conversion method above to calculate the short URL.

Insertion:
1. Calculate the hash value of the id of the long URL
2. Locate the server by the hash value and store the id and longURL there.
3. Calculate the short URL based on base conversion and return.

Query (redirect):
1. Calculate the id of the short URL by base conversion algorithm.
2. Locate the server by the hashing the id.
3. find the long URL and redirect.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s