Given an input file with 4 billion integers, generate an integer that is not contained in the file

Assume you have 1 GB of memory.

For 4 billion integers, that is 2^32 integers. We have 1GB memory, that is 2^30 Byte, 2^33 bits, so we can map every integer to every bit, then scan all the integers, if the integer exist in the file, then we set that bit to 1, Then we just need to find the bit that is 0, and map the bit back to the integer, that is the integer we are looking for.

What if you have only 10MB of memory, you dont have enough bits so the above method will not work. However, we can try to divide the input into some blocks, and count the number of integer that falls into the blocks in the first pass. Then we find the block that has not been filled, and we run a second pass to see which element is missing. In this pass, we set a bit vector that is the length of the block size, and when we have a element that should fall into this block, we set the corresponding bit to 1, just like the first example. Then we will be able to know exactly which integer is missing.

Lets say we set the block size to b values total number of values are N, each counter contains one integer which takes 32 bits. so there will be N/b counters which occupied N/b*32 bits. Assume S is the number of bits of the we need S >= N/b*32 on our first run On the second run, we need to have S number of bits mapping to b values, so S >= b, we can choose a b = 2^20 which can fit very easily on 10MB memory. Actually if we have N/b*32 == b, we can know that the smallest footprint we can have is S == b == sqrt(N*32);

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s