Algorithm design
From Driscollwiki
Kleinberg, J. & Tardos, É. (2005) Algorithm design. Pearson Education.
Chapter 1: Introduction
Stable matching
History
- 1962
- David Gale
- Lloyd Shapley
- Could one design a preferential pairing process that was self-enforcing?
Statement
- Two groups to be paired
- Each member has an ordered preference of the other group
Gale-Shapley algorithm
(6)
Initially all m in M and w in W are free While there is a man m who is free and hasn't proposed to every woman Choose such a man m Let w be the highest-ranked woman in m's preference list to whom m has not yet proposed If w is free then (m, w) become engaged Else w is currently engaged to m' If w prefers m' to m then m remains free Else w prefers m to m' (m, w) become engaged m' becomes free Endif Endif EndWhile Return the set S of engaged pairs
Analysis
- w remains engaged from first engagement and her situation continually improves
- sequence of women to whom m proposes gets worse
Measuring progress
G-S algorithm terminates after at most n^2 iterations of the loop
- Use progress to measure
- Each iteration involves one pair of (m, w)
- Let P(t) = set of pairs (m, w) at the end of iteration t
- P(t+1) will always be greater than the size of P(t)
- There are only n^2 possible pairs (m, w) so the value of P() can increase at most n^2 times
Perfect, stable matching?
- Always perfect because M and W are of equal size and the loop continues while there are free m in M
- Stable because it isn't possible to end up in a situation where there are pairs with better options available
Using graphs
- A way of encoding pairwise relationships among a set of objects
- G consists of a pair of sets (V, E)
- V, nodes
- E, edges
Five representative problems
- Note: these are all discussed later in the text...
Interval scheduling
- Limited resource (e.g. a classroom) can be used by 0 or 1 people at time
- Scheduler goals
- To accept a subset of requests, reject the rest, so that there are no overlaps
- To maximize the number of requests accepted
- There will be n requests to use the resource
- A request takes the form of a start time, s, and a finish time, f.
- Compatible requests do not overlap
- A subset is said to be compatible if none of its members overlap
- Our goal is to find the compatible subset of greatest size
Weighted Interval Scheduling
Bipartite Matching
Independent Set
Competitive Facility Location
Chapter 2: Basics of Algorithm Analysis
- Algorithms for solving discrete problems should satisfy certain clearly delineated goals
- They should also be efficient in terms of time (cycles) and space (memory)
- Focus on worst-case running time for input N and see how it scales with N
- Compare this with brute-force search over the space of possible solutions (32)
Brute-force search in practice with Stable matching
- Search space is n! possible perfect matches
- Brute-force search would plow throw n! one at a time and test for stability
- "an intellectual cop-out" (32)
- The Gale-Shapley algorithm spends time proportional only to N
- "A compact representation, implicitly specifying a giant search space" (32)
Polynomial time
- Search spaces for natural combinatorial problems tend to grown exponentially in the size N of the input (33)
- The algorithm should have a better scaling property
- e.g. better might be constant linear growth instead of exponentially greater growth
- For example, there might be an algorithm such that it's running time is based on the relationship of two constants (c > 0, d > 0); running time = cNd
- If the input size doubles, then the running time increases to c(2N)d = c(2d)(Nd)
Progressively better definitions for efficiency
An algorithm is efficient if ...
- it runs quickly on real input instances when implemented
- it achieves qualitatively better worst-case performance, at an analytical level, than brute-force search
- it has a polynomial running time
Asymptotic order of growth
Asymptotic upper bounds, O(f(n))
- O(f(n)) expresses an upper-bound for the function T(n) that expresses the worst-case running time of an algorithm over input of size n
-
where c is some constant greater than 0
- It doesn't express the exact growth. Just the upper bound for all n greater than 0.
- For example, if the worst-case running time is T(n) = pn2 + qn + r, then
- If c = (p + q + r) then
for all
- If c = (p + q + r) then
Asmyptotic lower bounds, Ω(f(n))
- Complementary notion to O( * )
- T(n) = Ω(f(n)) if there exists a constant ε > 0 so that for all
, we have
- For the example above, T(n) = pn2 + qn + r is Ω(n) since
Asmyptotic near bounds, Θ(f(n))
- If a function is exactly like f(n) within a constant factor,
- e.g. T(n) is both O(n2) and Ω(n2)
- We can say that it is Θ(n2)
- Somtimes you can obtain an asymptotically tight bound by computing a limit as n goes to infinity
- If the ratio of functions f(n) and g(n) converges to a positive constnat, then f(n) = Θ(g(n))
Properties of asymptotic growth rates
- Transitivity, if f is asymptotically upper bounded by g and g is upper-bounded by h, then f is upper bounded by h
- Sums of functions, if f = O(h) and g = O(h), then f+g = O(h)
Asymptotic bounds for some common functions
In general,
Polynomials
- Polynomials, let f be a polynomial of degree d in which the coefficient ad is positive. Then f = O(nd)
- The Polynomial time algorithm is one whose running time T(n) is O(n^d) for some constnat d where d is independent of the input size
Logarithms
- Logarithms, very slowly growing functions (41)
- logbn = O(nx) for every b>1 and every x>0
- Also, people can write O(logn) without specifying the base because of the identity property of logarithms makes it possible to translate between bases by multiplying a constant (see pp41)
- Note, for logbn = x; bx = n, if we round x down to the nearest integer, it is one less than the number of digits in the base-b representation of the number n
- e.g. 1 + log2n rounded down is the number of bits needed to represent n.
Exponentials
- Exponentials, very fast growing functions (42)
- nd = O(rn) for every r>1 and d>0
- Sometimes people write "running time is exponentially" without specifying the exact exponential
- Although sloppy, this is done because we can usually dismiss any exponentially growing function without closer analysis
Characteristics of lists and arrays (42)
Array
- Accessing the ith element takes O(1) time with A[i] notation
- Searching for an element in the list takes O(n) time (assuming unordered list)
- If the elements are ordered, search will take O(logn) time using binary search
- Frequent insertion and deletion is cumbersome (44)
Linked list
- Each element contains a value and a pointer to the next element
- Final element points to null
- List begins with a First pointer to element 0
Doubly-linked list
- Same as previous except...
- Each element also contains a previous pointer
- And there is a Last pointer to element n
- Deletion involves "splicing out" an element by reseting the pointers before and after it
- Insertion involves "splicing in" an element with the same two operations
- Searching for the ith element requires O(i) time
Implementing the stable matching algorithm (45)
- Array of men's preferences, MenPref[m, i] denoting the ith woman on the preference list of man m
- Array of men's preferences, WomenPref[m, i] denoting the ith man on the preference list of woman w
- Space required for these two lists is O(n2)
What will we need to do?
- To be able to identify a free man
- To be able to identify the highest-ranking woman to whom a given man m has not yet proposed
- To decide if a given woman w is currently engaged and, if so, to which man m
- For a woman w and two men m and m` to decide in constant time, which is preferred
Choosing data structures
The structures below ensure that each step in our algorithm is executed in O(1) or constant time:
- List of free men as a linked list, easy to add and remove
- Step 1 can be done in O(1)
- Array of next woman to propose to for each man, Next[m]
- Length = n
- To address the next woman, w = ManPref[m, Next[m]]
- Once he proposes, Next[m] is incremented by 1
- Step 2 can be done in O(1)
- Array Current[w] of length w
- Current[w] = woman's current partner m`
- Special null symbol used for single women not engaged
- Step 3 can be done in O(1)
- Array Ranking[w, m] with each woman's preferences ordered according to man's ID
- Thus Ranking[w, m] can be compared with Ranking[w, m'] to determine which is preferred
- This costs O(n) time to construct
- But only costs O(1) each time it is used in our loop
- Now Step 4 can be executed in O(1)
We already know that the Gale-Shapley algorithm was O(n2) time
- Because of these wise data structure selections, we ensure that the implemented version is also O(n2)
Survey of common running times
Linear time
- O(n)
- Running time is at most a constant factor times the size of the input (48)

