Algorithm 1: DG ( Duality Gap )
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Current β
 λ ⊳  Grid element

SVG-Viewer needed.


1:  ϵ Y
2:  fβ ←||ϵ||22 + λ||β||1 ⊳  Primal Objective function
3:  α ---λ---- ||2XT ϵ||∞
4:  α0 Y T ϵ
5:  S min{max{α,α0},α} ⊳  Dual Point
6:  ^ν −-2S   λϵ + -2 λY
7:  dν 1 4λ2||^ν ||2 2 −||Y ||2 2 ⊳  Dual Objective function

return fβ + dν



Algorithm 2: DGT ( Duality Gap Target )
 
Input:  

SVG-Viewer needed.


 γ ⊳
 C ⊳
 rStatsIt ⊳  Index of the current grid element ( outer loop iteration number )
 n ⊳  Number of rows in the design matrix X n×p

SVG-Viewer needed.


1:  dgt γC2r2StatsIt --n---

return dgt



Algorithm 3: fβ
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Current β

SVG-Viewer needed.


1:  f Y .

return ||f||22



Algorithm 4: f^β
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  The k’th β vector
 β′∈ n ⊳  The k 1’th β vector
 L ⊳  The current Lipschitz constant, as computed by backtracking line search

SVG-Viewer needed.


1:  f Y .
2:  t0 ←||f||22
3:  f 2XT f
4:  Δβ β β
5:  t1 ←∇fT Δβ
6:  t2 L- 2||Δβ||2 2

return t0 + t1 + t2



Algorithm 5: τ ( Matrix-wise Soft-Thresholding / Proximal Operator )
 
Input:  

SVG-Viewer needed.


 X n×m ⊳  An arbitrary matrix
 λ ⊳  The thresholding parameter

SVG-Viewer needed.


1:  ^X X ⊳  Make a copy of X.
2:  for  ^x i,j X^  do
3:   ^x i,j sign(^x i,j)(|^xi,j|− λ )+
4:  end for

return ^X



Algorithm 6: ISTA with backtracking line search and duality gap convergence criteria
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Starting vector
 L0 ⊳  Initial Lipschitz constant, used by backtracking line search
 λ ⊳  Grid element
 η ⊳  Step size when updating Lipschitz constant
 D ⊳  Duality gap target

SVG-Viewer needed.


1:  ^β β ⊳  Make a copy of β that will be updated during back tracking.
2:  do
3:   ^ β τ(β -1 Lf(X,Y,^ β ,L))
4:   while  fβ(X,Y,β^ ) > fβ^(X,Y,β^ ,L)  do
5:   L ηL
6:   ^β τ(β 1- Lf(X,Y,β))
7:   end while
8:   β τ(β L1f(X,Y,β,L)) ⊳  Update β once L is sufficiently large.
9:  while  DG (X,Y,β,λ) > D
10:  

return β



Algorithm 7: Coordinate Descent with duality gap convergence criteria
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Starting vector
 λ ⊳  Grid element
 D ⊳  Duality gap target

SVG-Viewer needed.


1:  ^β β ⊳  Make a copy of β
2:  do
3:   for  i 1,2,,p  do
4:   t 2|λ|X-||2    i 2 ⊳  Scale grid element by norm of the i’th column of design matrix
5:   Xi Xji ⊳  Take all columns of design matrix not equal to i
6:   ^β i β^ ji ⊳  Take all elements of predictors vectors not equal to i
7:   r XTi (Y−X-−iβ^−-i)    ||Xi||22 ⊳  Compute the scaled residual
8:   ^β i τ(r,t) ⊳  Update the i’th element of Beta
9:   end for
10:  while  DG (X,Y,^β ) > D
11:  

return ^
β



Algorithm 8: Coordinate Descent with Lazy Evaluation
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Starting vector
 λ ⊳  Grid element
 D ⊳  Duality gap target

SVG-Viewer needed.


1:  ^β β ⊳  Make a copy of β
2:  R Y X^ β ⊳  Initialize Intermediary Residual
3:  do
4:   for  i 1,2,,p  do
5:   t --λ--- 2||Xi||22 ⊳  Scale grid element by norm of the i’th column of design matrix
6:   if  β^ i0  then R R + XiB^i
7:   end if
8:   ^ β i τ(-XTi R  )  ||Xi||22,t ⊳  Update the i’th element of Beta
9:   if  β^ i0  then R R XiB^i
10:   end if
11:   end for
12:  while  DG (X,Y,^β ) > D
13:  

return ^β



Algorithm 9: Coordinate Descent with standardized data
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The standardized Xi = 0Xi = 1 design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Starting vector
 λ ⊳  Grid element
 D ⊳  Duality gap target

SVG-Viewer needed.


1:  ^β β ⊳  Make a copy of β
2:  do
3:   for  i 1,2,,p  do
4:   t 2λn ⊳  Scale grid element by norm of the i’th column of design matrix
5:   Xi Xji ⊳  Take all columns of design matrix not equal to i
6:   ^β i β^ ji ⊳  Take all elements of predictors vectors not equal to i
7:   r XTi (Y−X-−iβ^−-i)      n ⊳  Compute the scaled residual
8:   ^β i τ(r,t) ⊳  Update the i’th element of Beta
9:   end for
10:  while  DG (X,Y,^β ) > D
11:  

return ^
β


The next algorthim is a modified version of Coordinate Descent that seeks to reduce redunant computations as much as possible. This algorthim relies on the fact that many computation required by Coordinate Descent can be broken up into constant and non-constant parts. The constant parts of the computation can be performed ahead of time and stored for later use.

Of particular note is the scaled residual computation, which when written down naively reads:

r XTi (Y−-X−i^β−i)
    ||Xi||22.

Which we can re-write as,

r XTi-Y-
||Xi||22 XTi X−i
 ||Xi||22^β i.

Note that since the design matrix X and the vector of predictors Y are fixed, the terms   T
-Xi Y2
||Xi||2 and XT X
-||iXi−||i22 do not change as the values of ^β are updated. Our strategy will be to compute these values for each column of X and store them in an array of size p, from which they will be access as ^B is updated.

Note for this algorthim we establish the convention that array of a given data type will be declared as follows:

( type )[ # elements in array ]

As an example an array of real numbers numbers of size n would be written as:

( )[ n ]


Algorithm 10: Coordinate Descent with Minimal Data Copying
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Starting vector
 λ ⊳  Grid element
 D ⊳  Duality gap target

SVG-Viewer needed.


1:  ()[p] ⊳  Initialize array of size p to hold threshold parameters
2:  p1 ()[p] ⊳  Blank array for part of residual computation
3:  p2 (1×p)[p] ⊳  Blank array to row vectors of size p which will be used for part of residual computation
4:  for  i 1,2,,p  do
5:   --1-- ||Xi||22
6:   [i] 12
7:   p1[i] XiT Y
8:   p2[i] XiT X
9:  end for
10:  ^β β ⊳  Make a copy of β
11:  do
12:   for  i 1,2,,p  do
13:   ^γ β^ j if ji else 0 ⊳  Copy all of beta expect i’th element which is assigned to 0
14:   r p1[i] p2[i]^γ ⊳  Compute the scaled residual
15:   t λ[i] ⊳  Compute threshold parameter
16:   ^ β i τ(r,t) ⊳  Update the i’th element of Beta
17:   end for
18:  while  DG (X,Y,^β ) > D
19:  

return ^β



Algorithm 11: FISTA with backtracking line search and duality gap convergence criteria
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Starting vector
 L0 ⊳  Initial Lipschitz constant, used by backtracking line search
 λ ⊳  Grid element
 η ⊳  Step size when updating Lipschitz constant
 D ⊳  Duality gap target

SVG-Viewer needed.


 yk1 b ⊳  Beta vector from previous iteration of FISTA
 xk1 b ⊳  Intermediary vector from previous iteration of FISTA
1:  yk β
2:  do
3:   yk1 xk
4:   tk 0
5:   ^yk τ(β 1 L-f(X,Y,yk))
6:   while  fβ(X,Y,y^k) > f^β(X,Y,^yk,yk,L)  do
7:   L ηL
8:   ^yk τ(β 1L-f(X,Y,β))
9:   end while
10:   xk1 xk
11:   xk τ(β L1f(X,Y,y^k))
12:   tk+1 = (  √ ----) -1+--1+4t2k--     2
13:   yk xk + (tkt−k+11)(xk − xk−1)
14:  while  DG (X,Y,β,λ) > D
15:  

return yk,yk1,xk1,tk+1



Algorithm 12: λGRID
 
Input:  

SVG-Viewer needed.


 X n×m ⊳  The design matrix
 Y n ⊳  The vector of predictors
 M ⊳  The number of grid elements required

SVG-Viewer needed.


1:  rmax 2||XT Y ||
2:  rmin --1- 1000rmax
3:  Δr (rmax − rmin)
4:  Let Λ M ⊳  Initialize empty array of size M
5:  for  i [1,2,,M]  do
6:   δl Δr i M−-1 + rmin ⊳  Compute linear step
7:   Λ[i] 10δl ⊳  Convert to logarithmic step
8:  end for

return Λ



Algorithm 13: SCC: Stats Continutation Condition
 
Input:  

SVG-Viewer needed.


 C ⊳  The design matrix
 statsIt ⊳  The vector of predictors
 λ ⊳  Current grid element
 Λ M ⊳  Vector of grid elements
 X n×p ⊳  Vector of grid elements
 βs n×M ⊳  Betas matrix

SVG-Viewer needed.


1:  condition false
2:  for  i [1,2,,statsIt]  do
3:   rk Λk
4:   Δβ βstatsIt βi
5:   check -n||Δβ||∞-- rstatsIt+rk
6:   condition condition & ( check C )
7:  end for

return condition



Algorithm 14: FOS
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β n ⊳  Starting vector
 L0 ⊳  Initial Lipschitz constant, used by backtracking line search
 M M ⊳  Number of grid elements
 η ⊳  Step size when updating Lipschitz constant
 C
 γ

SVG-Viewer needed.


1:  ^X σ1-  X(X − μX ). ⊳  Normalize X to mean 0 and standard deviation 1.
2:  ^Y -1- σY(Y  − μ )       Y. ⊳  Normalize Y.
3:  Λ λGRID(X,Y,M) ⊳  Initialize grid elements
4:  βs n×m = 0n,m. ⊳  Initialize matrix of Betas to zero matrix
5:  while  statsCont & ( statsCont < M )  do
6:   statsIt statsIt + 1
7:   ^β βk1 ⊳  Initialize old beta vector with the k - 1’th Column of the Betas matrix.
8:   rstatsIt Λk ⊳  Extract the k’th grid element.
9:   if  DG(X,Y,βk,rstatsIt) DGT(γ,C,rstatsIt,nthen
10:   βk βk1
11:   else
12:   βk ISTA(X,Y,βk1,L0,rstatsIt,η,gap)
13:   end if
14:   statsCont SCC (C,statsIt,rstatsIt,Λ,X,βs)
15:  end while

return βstatsIt1,ΛstatsIt,statsIt



Algorithm 15: DP ( Dual Point )
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β p ⊳  Current β
 λ ⊳  Grid element

SVG-Viewer needed.


1:  R Y
2:  α ||XT1R||∞--
3:  s min{max{Y-TR2 λ||R||2,α}}

return sR



Algorithm 16: DG2 ( Duality Gap for Problem 1 )
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β p ⊳  Current primal point
 ν n ⊳  Current dual point
 λ ⊳  Grid element

SVG-Viewer needed.


1:  fβ 12||Y ||2 2 + λ||β||1 ⊳  Primal Objective function
2:  dν 1 2||Y ||2 2 λ2  2||ν Y- λ||2 2 ⊳  Dual Objective function

return fβ dν



Algorithm 17: SAS ( Safe Active Set )
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 c n ⊳  Center of the ball
 r 0 ⊳  Radius of the ball

SVG-Viewer needed.


1:  A←∅ ⊳  Initialize Active Set With Empty Set
2:  for  j ∈{1,,p} do
3:   if  |XjT c| + rXj2 then
4:   AA∪{j}
5:   end if
6:  end for

return A



Algorithm 18: CDSR (Coordinate Descent With Lazy Evaluation and Screening Rule)
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 β p ⊳  Starting vector
 λ ⊳  Grid element
 D ⊳  Duality gap target

SVG-Viewer needed.


1:  ^β β ⊳  Make a copy of β
2:  R Y X^ β ⊳  Initialize Intermediary Residual
3:  A←{1,,p} ⊳  Initialize Active Set
4:  optimCont true
5:  while  optimCont  do
6:   ν DP(XA,Y,^β A) ⊳  Dual point
7:   G DG2(XA,Y,^β A,ν,λ) ⊳  Duality gap
8:   A SAS(X,ν,∘ -2--    λ2G) ⊳  Safe Active Set
9:   if  GD then
10:   optimCont false
11:   else
12:   for  i A do
13:   t --λ-- ||Xi||22 ⊳  Scale grid element by norm of the i’th column of design matrix
14:   if  ^β i0  then R R + Xiβ^ i
15:   end if
16:   ^β i τ( XTR   )  ||Xii||22,t ⊳  Update the i’th element of Beta
17:   if  ^β i0  then R R Xiβ^ i
18:   end if
19:   end for
20:   end if
21:   ^β Ac = 0 ⊳  Set to 0 coefficients not in A
22:  end while

return ^β


Note that Algorithm 18 solves the problem

arg min 1-||Y − X β||22 + λ||β ||1
   β∈ℝp2
(1)


Algorithm 19: FOS With Screening Rule
 
Input:  

SVG-Viewer needed.


 X n×p ⊳  The design matrix
 Y n ⊳  The vector of predictors
 M ⊳  Number of grid elements
 C > 0
 γ > 0

SVG-Viewer needed.


1:  ^X σ1X-(X − μX ) ⊳  Normalize X to mean 0 and standard deviation 1.
2:  ^Y -1- σY(Y  − μY) ⊳  Normalize Y.
3:  Λ λGRID( ^ X,^ Y,M) ⊳  Initialize grid elements
4:  βs p×M 0p,M ⊳  Initialize matrix of Betas to zero matrix
5:  statsCont true
6:  statsIt 1
7:  while  statsCont & ( statsIt < M )  do
8:   statsIt statsIt + 1
9:   DDGT(γ,C,ΛstatsIt,n) ⊳  Duality gap target
10:   βstatsIt CDSR(X^,^YstatsIt1,ΛstatsIt2,D2)
11:   statsCont SCC (C,statsIt,ΛstatsIt,Λ,X,βs)
12:  end while

return βstatsIt1,ΛstatsIt1,statsIt