This web page is where I collect my experiments on data visualization, machine learning, and other technical miscellany.
2011 June 1
There was a huge response on “The Twitter” to the Kindle typography article I posted in April. Amazon has an SDK for their lil' e-reader, so I started a company to explore the possibilities. Drop us a line if you want a Kindle app or have other digital typography needs.
2011 April 18
Amazon’s Kindle ebook reader is an impressive piece of hardware, but its typographic rendering software leaves a bit to be desired. Lets add hyphenation and use the Knuth and Plass line breaking algorithm via the built-in WebKit browser. Advanced typographic features such as hanging punctuation and non-rectangular paragraph shapes are also possible.
2011 February 23
I am giving a talk on data-driven JavaScript applications at 6:30 on Wednesday, February 23rd in downtown Portland.
Webtrends
851 SW 6th Avenue
Floor 16
Portland OR 97204
2010 December 4
Prote.cs is an algorithm that assigns an uncharacterized protein structure to a fold family by answering the question
“Which fold family has members that can best sum to approximate this protein?”
More specifically, proteins are mapped into a vector space by their alpha carbon distance matrices and then assigned to a fold family according to an $l^1$-norm minimized linear regression on a basis derived from protein structures with known fold.
An accuracy of 95% was achieved on a set of 466 CATH fold families using just six-by-six pixel distance matrix thumbnails (21 dimensions).
Details and examples here, code on github
2010 November 14
I’ve rewritten YALL1 (“Your ALgorithms for L1”) for Octave compatibility. YALL1 is a Matlab program by Yin Zhang, Junfeng Yang, and Wotao Yin (see original) that solves several $l^1$ minimization problems that have become notable in connection with compressed sensing. These problems are variations on the basis pursuit theme:
\[ \min \norm{\vec x}_1 \quad \mbox{such that} \quad \mat A \vec x = \vec b, \]
where the measurement matrix $\mat A \in \real^{m \times n}$ has $m \lt\lt n$, and the solution $\vec x$ is known to be (approximately) sparse. As it turns out, sparse reconstructions have a wide variety of applications in image compression, signal acquisition, and even classification/machine learning problems.
Compressed sensing details and examples here, code on github
2010 November 5
I’ve forked an R interface to the LIBLINEAR C++ library, which solves classification and regression problems having millions of instances and features. These problems are variations on the theme:
\[ \min_{\mathbf w} \quad \tfrac{1}{2}\mathbf w^{\mathrm{T}} \mathbf w + C \sum_{i=1}^{l} \max\left(0, 1 - y_{i} \mathbf{w}^{\mathrm T}\mathbf x_{i} \right)^{2}\kern-6pt, \]
with \( \mathbf x_i \) a sample vector, \( y_i \in \{0, 1\} \) its class, and \( \mathbf w \) the vector that defines a decision hyperplane (problems with \(k\) classes are handled automatically by finding \(k\) one-vs-all hyperplanes).
Essentially, this is a very fast, large-scale support vector machine library that has no desire to play in fancy-pants kernel spaces. The R interface automatically expands categorical features (i.e. factors) with $c$ levels into $c$ binary dimensions.
Details and examples here, code on github
2010 October 2
I’ve updated my protein residue & Voronoi cell viewer to use the o3d-WebGL renderer. The WebGL canvas is supported by Firefox 4 and the current development builds of Chrome and Safari.
2010 September 1
A paper describing my work on information visualization for medical bill analytics has been accepted by IEEE VisWeek 2010.