Algorithm design

From Driscollwiki

Jump to: navigation, search


Kleinberg, J. & Tardos, É. (2005) Algorithm design. Pearson Education.

Contents

Chapter 1: Introduction

Stable matching

History

  • 1962
  • David Gale
  • Lloyd Shapley
  • Could one design a preferential pairing process that was self-enforcing?

Statement

  • Two groups to be paired
  • Each member has an ordered preference of the other group

Gale-Shapley algorithm

(6)

Initially all m in M and w in W are free While there is a man m who is free and hasn't proposed to every woman Choose such a man m Let w be the highest-ranked woman in m's preference list to whom m has not yet proposed If w is free then (m, w) become engaged Else w is currently engaged to m' If w prefers m' to m then m remains free Else w prefers m to m' (m, w) become engaged m' becomes free Endif Endif EndWhile Return the set S of engaged pairs

Analysis

  • w remains engaged from first engagement and her situation continually improves
  • sequence of women to whom m proposes gets worse

Measuring progress

G-S algorithm terminates after at most n^2 iterations of the loop

  • Use progress to measure
  • Each iteration involves one pair of (m, w)
  • Let P(t) = set of pairs (m, w) at the end of iteration t
    • P(t+1) will always be greater than the size of P(t)
  • There are only n^2 possible pairs (m, w) so the value of P() can increase at most n^2 times

Perfect, stable matching?

  • Always perfect because M and W are of equal size and the loop continues while there are free m in M
  • Stable because it isn't possible to end up in a situation where there are pairs with better options available

Using graphs

  • A way of encoding pairwise relationships among a set of objects
  • G consists of a pair of sets (V, E)
    • V, nodes
    • E, edges

Five representative problems

  • Note: these are all discussed later in the text...

Interval scheduling

  • Limited resource (e.g. a classroom) can be used by 0 or 1 people at time
  • Scheduler goals
    • To accept a subset of requests, reject the rest, so that there are no overlaps
    • To maximize the number of requests accepted
  • There will be n requests to use the resource
  • A request takes the form of a start time, s, and a finish time, f.
    • Compatible requests do not overlap
    • A subset is said to be compatible if none of its members overlap
    • Our goal is to find the compatible subset of greatest size

Weighted Interval Scheduling

Bipartite Matching

Independent Set

Competitive Facility Location

Chapter 2: Basics of Algorithm Analysis

  • Algorithms for solving discrete problems should satisfy certain clearly delineated goals
  • They should also be efficient in terms of time (cycles) and space (memory)
  • Focus on worst-case running time for input N and see how it scales with N
  • Compare this with brute-force search over the space of possible solutions (32)

Brute-force search in practice with Stable matching

  • Search space is n! possible perfect matches
  • Brute-force search would plow throw n! one at a time and test for stability
    • "an intellectual cop-out" (32)
  • The Gale-Shapley algorithm spends time proportional only to N
    • "A compact representation, implicitly specifying a giant search space" (32)

Polynomial time

  • Search spaces for natural combinatorial problems tend to grown exponentially in the size N of the input (33)
    • The algorithm should have a better scaling property
    • e.g. better might be constant linear growth instead of exponentially greater growth
  • For example, there might be an algorithm such that it's running time is based on the relationship of two constants (c > 0, d > 0); running time = cNd
    • If the input size doubles, then the running time increases to c(2N)d = c(2d)(Nd)

Progressively better definitions for efficiency

An algorithm is efficient if ...

  1. it runs quickly on real input instances when implemented
  2. it achieves qualitatively better worst-case performance, at an analytical level, than brute-force search
  3. it has a polynomial running time

Asymptotic order of growth

Asymptotic upper bounds, O(f(n))

  • O(f(n)) expresses an upper-bound for the function T(n) that expresses the worst-case running time of an algorithm over input of size n
  • T(n) \leq c*f(n) where c is some constant greater than 0
  • It doesn't express the exact growth. Just the upper bound for all n greater than 0.
  • For example, if the worst-case running time is T(n) = pn2 + qn + r, then T(n) \leq (p+q+r)*n^2
    • If c = (p + q + r) then T(n) \leq O(n^2) for all n \geq 0

Asmyptotic lower bounds, Ω(f(n))

  • Complementary notion to O( * )
  • T(n) = Ω(f(n)) if there exists a constant ε > 0 so that for all n \geq 0, we have T(n) \geq \epsilon*f(n)
  • For the example above, T(n) = pn2 + qn + r is Ω(n) since T(n) \geq pn^2 \geq pn

Asmyptotic near bounds, Θ(f(n))

  • If a function is exactly like f(n) within a constant factor,
    • e.g. T(n) is both O(n2) and Ω(n2)
    • We can say that it is Θ(n2)
  • Somtimes you can obtain an asymptotically tight bound by computing a limit as n goes to infinity
    • If the ratio of functions f(n) and g(n) converges to a positive constnat, then f(n) = Θ(g(n))

Properties of asymptotic growth rates

  • Transitivity, if f is asymptotically upper bounded by g and g is upper-bounded by h, then f is upper bounded by h
  • Sums of functions, if f = O(h) and g = O(h), then f+g = O(h)

Asymptotic bounds for some common functions

In general,

  • O(log n) \leq O(n^x) \leq O(r^n)

Polynomials

  • Polynomials, let f be a polynomial of degree d in which the coefficient ad is positive. Then f = O(nd)
    • The Polynomial time algorithm is one whose running time T(n) is O(n^d) for some constnat d where d is independent of the input size

Logarithms

  • Logarithms, very slowly growing functions (41)
    • logbn = O(nx) for every b>1 and every x>0
    • Also, people can write O(logn) without specifying the base because of the identity property of logarithms makes it possible to translate between bases by multiplying a constant (see pp41)
  • Note, for logbn = x; bx = n, if we round x down to the nearest integer, it is one less than the number of digits in the base-b representation of the number n
    • e.g. 1 + log2n rounded down is the number of bits needed to represent n.

Exponentials

  • Exponentials, very fast growing functions (42)
    • nd = O(rn) for every r>1 and d>0
  • Sometimes people write "running time is exponentially" without specifying the exact exponential
    • Although sloppy, this is done because we can usually dismiss any exponentially growing function without closer analysis

Characteristics of lists and arrays (42)

Array

  • Accessing the ith element takes O(1) time with A[i] notation
  • Searching for an element in the list takes O(n) time (assuming unordered list)
  • If the elements are ordered, search will take O(logn) time using binary search
  • Frequent insertion and deletion is cumbersome (44)

Linked list

  • Each element contains a value and a pointer to the next element
  • Final element points to null
  • List begins with a First pointer to element 0

Doubly-linked list

  • Same as previous except...
  • Each element also contains a previous pointer
  • And there is a Last pointer to element n
  • Deletion involves "splicing out" an element by reseting the pointers before and after it
  • Insertion involves "splicing in" an element with the same two operations
  • Searching for the ith element requires O(i) time

Implementing the stable matching algorithm (45)

  • Array of men's preferences, MenPref[m, i] denoting the ith woman on the preference list of man m
  • Array of men's preferences, WomenPref[m, i] denoting the ith man on the preference list of woman w
  • Space required for these two lists is O(n2)

What will we need to do?

  1. To be able to identify a free man
  2. To be able to identify the highest-ranking woman to whom a given man m has not yet proposed
  3. To decide if a given woman w is currently engaged and, if so, to which man m
  4. For a woman w and two men m and m` to decide in constant time, which is preferred

Choosing data structures

The structures below ensure that each step in our algorithm is executed in O(1) or constant time:

  • List of free men as a linked list, easy to add and remove
    • Step 1 can be done in O(1)
  • Array of next woman to propose to for each man, Next[m]
    • Length = n
    • To address the next woman, w = ManPref[m, Next[m]]
    • Once he proposes, Next[m] is incremented by 1
    • Step 2 can be done in O(1)
  • Array Current[w] of length w
    • Current[w] = woman's current partner m`
    • Special null symbol used for single women not engaged
    • Step 3 can be done in O(1)
  • Array Ranking[w, m] with each woman's preferences ordered according to man's ID
    • Thus Ranking[w, m] can be compared with Ranking[w, m'] to determine which is preferred
    • This costs O(n) time to construct
    • But only costs O(1) each time it is used in our loop
    • Now Step 4 can be executed in O(1)

We already know that the Gale-Shapley algorithm was O(n2) time

  • Because of these wise data structure selections, we ensure that the implemented version is also O(n2)

Survey of common running times

Linear time

  • O(n)
  • Running time is at most a constant factor times the size of the input (48)
Personal tools