Latest Comments
Search
Stuff I like
  • The Essays of Warren Buffett: Lessons for Corporate America, Second Edition
    The Essays of Warren Buffett: Lessons for Corporate America, Second Edition

    Read this review post

  • The Five Rules for Successful Stock Investing: Morningstar's Guide to Building Wealth and Winning in the Market
    The Five Rules for Successful Stock Investing: Morningstar's Guide to Building Wealth and Winning in the Market

    Read this review post

  • Programming Pearls (2nd Edition)
    Programming Pearls (2nd Edition)
  • Pattern Recognition and Machine Learning (Information Science and Statistics)
    Pattern Recognition and Machine Learning (Information Science and Statistics)
  • Programming Collective Intelligence: Building Smart Web 2.0 Applications
    Programming Collective Intelligence: Building Smart Web 2.0 Applications
  • Dyson DC25 Ball All-Floors Upright Vacuum Cleaner
    Dyson DC25 Ball All-Floors Upright Vacuum Cleaner
  • Sigma 30mm f/1.4 EX DC HSM Lens for Canon Digital SLR Cameras
    Sigma 30mm f/1.4 EX DC HSM Lens for Canon Digital SLR Cameras

Powered by Squarespace

Entries in data mining (2)

Wednesday
Feb172010

Yew Jin Lim (disambiguation)

I have been asked about various times about mentions of "Yew Jin" on the web that I decided to disambiguate them here so that interested parties (all two of you) can use your favourite search engine to find the answer. These are articles I do not mind confirming that they refer to me (heuristic: if the article is damning, it's not me!):

Laundry and bikes at work, anyone? by Grace Chng in Straits Times Blogs (2008/12/3)

Singaporeans in Google by Bhagyashree Garekar Straits Times (2008/4/20)

SoC Team Outmanoeuvres Rivals in Competition on Artificial Intelligence in Games in SoC News

N-VCPE-R International Contest Top 50 scorers

Aren't you that millionaire from Google?

No, that's Meng - that rich devilishly handsome bastard. Also, I am also neither a lawyer, nor involved in the Malaysia Amateur Basketball Association

Monday
Jul092007

Partially-Observed "Singular Value Decomposition"

Singular Value Decomposition is a fanastic linear algebra operation which factorizes a matrix - that is, for a matrix M, it returns U, S, V such that M = U * S * V'. This can be viewed as a basis transformation from row space to column space. The main problem is that standard SVD requires a completely-observed matrix M, that is, all values of M are known.

SVD has been shown to work surprisingly well for collaborative filtering applications such as the netflix prize. However, in such problems, the matrix is usually partially-observed, or in other words, some entries of M are not known. The "simple" workaround has been to perform gradient descent in U and V to minimize the objective function:

f = \sum_{i=1}^N \sum_{j \in N(i)} (u_i * v_j' - M_{ij})^2 + \lambda \sum_{ij} ( ||u_i||^2 + ||v_j||^2 )

where N(i) are the observed entries in row i. This breaks down to the updates:

\delta_{ij} = u_i * v_j' - M_{ij}
u_{ik} = u_{ik} + \alpha( \delta_{ij} v_{jk} - \lambda u_{ik} )
v_{jk} = v_{jk} + \alpha( \delta_{ij} u_{ik} - \lambda v_{jk} )

Since we only work with known values of M, this simply avoids the requirement of M being completely-observed.

I am hoping to use this simple technique to perform data mining in other similar problems where "collaborative filtering" is assumed but tedious to compute - e.g., medical data where you have tons of clinical measurements and information about patients, together with a set of diagnosis. So given a set of patient info and clinical measurements, can you predict the diagnosis? Casting this information into a partially-observed matrix M is fairly simple, and I am having moderate success. Nevertheless, I would prefer to have Bayesian belief networks to incorporate some prior knowledge into a structured model.