Recently in my internship at Microsoft I signed up for a task of creating a global leaderboard for our product users. In this article I will discuss a few implementations of a leaderboard, with a focus on backend logic.
The scenario of this discussion is this: you need to create a global leaderboard of scores for a simple Scrabble game online. The only information this leaderboard keeps is the player’s name and score. You only care about ranking and percentile of a player.
The brute force
Perhaps the simplest way to implement a leaderboard, the brute force approach will most likely only take a List<> object, and Sort() it whenever an entry is entered or removed. Sample code in C# will probably look like this:
public class Player : IComparable<Player> { string Name { get; private set; } double Score { get; private set; } public int CompareTo(Player other) { if (this.Score > other.Score) return -1; if (this.Score == other.Score) return 0; return 1; } } public class Leaderboard { List<Player> leaderboard; public void insert(Player p) { leaderboard.Add(p); leaderboard.Sort(); } //Remove a Player public void remove(Player p) { int indexOfPlayer = leaderboard.BinarySearch(p); if(indexOfPlayer >= 0) //Found Player to delete leaderboard.RemoveAt(indexOfPlayer); } }
With this approach, insert() method takes O(n * logn) time, because according to MSDN, List<T>.Sort() takes O(n * logn) time. Remove() method takes O(logn), for binary search on sorted list.
The binary search tree
With a self-balancing binary search tree (BST), insertion and removal could both be reduced to O(logn) time. In C# library there are some build-in options for implementing a self-balancing BST, such as SortedList<TKey, TValue> and SortedDictionary<TKey, TValue> classes. According to MSDN, they both have O(log n) retrieval time, but SortedList has O(n) insertion and removal, and less memory usage, as opposed to SortedDictionary with O(log n), and more memory usage.
Implementations with the two classes are rather similar. Taking SortedDictionary as a example, I may write code looking like this:
public class Player : IComparable<Player> { string Name { get; private set; } double Score { get; private set; } public int CompareTo(Player other) { if (this.Score > other.Score) return -1; if (this.Score == other.Score) return 0; return 1; } } public class Leaderboard { //This structure assumes no duplicated key (Score) exists. //If two Players might have same scores, this needs to be SortedDictionary<double, List<string>>. //TODO: extend definition to allow Players with same Score. SortedDictionary<double, string> leaderboard; public void insert(Player p) { try { leaderboard.Add(p.Score, p.Name); } catch(ArgumentException) { //TODO: use List<string> to allow Players with same Score. } } //Remove a Player. Returns false if Player doesn't exist. public bool remove(Player p) { leaderboard.Remove(p.Score); //TODO: if duplicated Score allowed, use List<string>.Remove() instead. } }
With this approach, insertion and removal both take O(log n) time.
The histogram
This approach is slightly different than the previous ones in that it is best at computing a user’s percentile rank. A histogram is the grouping of data into bins (spaced apart by the so-called class interval) plotting the number of members in each bin versus the bin number (Wolfram math world). The idea behind this approach is to simulate a histogram with data buckets, and estimate a given Player’s percentile ranking.
Assuming the range of possible score values is bound by two double variables, min and max, sample code may look like this:
public class Player { string Name { get; private set; } double Score { get; private set; } } public class Histogram { private double min; //Lower bound of range private double max; //Upper bound of range private double interval; //Length of each bucket private int playerCount; //Total number of players in histogram private int bucketCount; //Total number of buckets private List<string>[] buckets; public Histogram(double rangeMin, double rangeMax, double interval) { min = rangeMin; max = rangeMax; interval = interval; bucketCount = (int)((max - min)/interval); buckets = new List<string>[bucketCount]; } private void Add(Player p) { if(p.Score > max || p.Score < min) { //Player's score doesn't fit in current Histogram throw new Exception(); } int index = Math.Ceiling(p.Score/interval); buckets[index].Add(p.Name); playerCount++; } private void Remove(Player p) { if(p.Score > max || p.Score < min) { //Player's score doesn't fit in current Histogram throw new Exception(); } int index = Math.Ceiling(p.Score/interval); if(buckets[index].Remove(p.Name)) playerCount--; } private double GetPercentileRank(Player p) { if(p.Score > max || p.Score < min) { //Player's score doesn't fit in current Histogram throw new Exception(); } int index = Math.Ceiling(p.Score/interval); int BetterThanMe = buckets[index].Count; for(int i = index + 1; i <= bucketCount; i++) { BetterThanMe += buckets[i].Count; } double PercentileBetterThanMe = (double)BetterThanMe / (double) playerCount; return PercentileBetterThanMe * 100; } } public class Leaderboard { private Histogram histogram = new Histogram(0, 100, 100); public void Insert(Player p) { histogram.Add(p); } public void Remove(Player p) { histogram.Remove(p); } public double GetPercentileForPlayer(Player p) { //Example: if a 10 out of 100 players have better scores than p, return 10(%). return histogram.GetPercentileRank(p); } }
Insertion and removal both take O(1) time. However, retrieval of percentile ranking information is more expensive, as the for-loop can take O(n) time in worse case.