generate group by email address

第二题是有这么一个class Contact,里面有一个String的name,和一个List
装着email address,是这个Contact有的address,用一个list装着是因为一个人有可
能有多个email,现在给你an array of Contact,比如
#1 Johny [john@gmail.com]
#2 Mary [mary@gmail.com]
#3 Johnnathen [john@yahoo.com]
#4 John [john@gmail.com, john@yahoo.com, john@hotmail.com]
#5 Bob [bob@gmail.com]
现在我们知道如果email address相同的话,那么就说明是同一个人,现在要做的是根
据这些email address,把同一个contact给group起来,生成一个List<List>
,比如我们就可以知道#1,#3,#4是同一个人。注意不能根据名字来group,因为有可
能有相同名字的人,或者同一个人有可能有不同的名字来注册之类的。

This is a classic disjoint set problem. Every user has a (unique) number, and a linked list of emails that associated with the number. We want to group all the contacts that shares same email addresses. (1, 3, and 4 is a group for example).

We can first create a map that maps each email address to a set of users. Then we iterate through the original array, and join those sets which shares same email address.
given an example, if we have
1 -> e1, e2.
2 -> e2, e3,
The hash map will be
e1 -> 1,
e2 -> 1, 2
e3 -> 2,

Then we go through the original list and join the sets under e1 and e2, then e2 and e3, so we will have set 1 and 2.

This can also be solved using a graph like algorithm. we look at the all the name and email addresses as nodes, and the relation ship between them as edges. We can simply form a graph and then run a breath first search and all the names in a connected components will be grouped in one group.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s