Similarity and Clustering Calculation

4. Similarity and Clustering Calculation

Similarity Calculation by Multi-Resolution Dual Contour Tree

  • In a dual contour tree, every node within a sub-range is a connected sub-volume and there exits an arc connecting two nodes whose corresponding sub-volumes are adjacent.
  • The multi-resolution dual contour tree Tm is constructed from G, the dual contour tree at the finest level. The size of the finest level tree is controlled by the number of sub-ranges, which is chosen to be for convenience. The coarser levels of Tm are constructed by merging the adjacent ranges and corresponding nodes.
  • Each node has a vector of attributes capturing its topological and geometrical properties: {V(m),R(m), B1(m), B2(m)}, where V(m) is its normalized volume, R(m) is its normalized functional range, and B1 and B2 are the Betti numbers of its lower and upper bounding surfaces.
  • The nodes of the multi-resolution dual contour trees are matched to each other. The similarity metric between two nodes m and n is defined using their attributes as follows: (m,n)=w1(V(m),V(n))+w2(R(m),R(n))+W3((B1(m),B1(n))+ (B2(m),B2(n)))/2, where w1+w2+w3=1 controls the weights of different parameters.
  • The similarity between two dual contour trees G1 and G2 is the sum of those of matched nodes.
  • The similarity between two molecules is the average of the similarities of their dual contour trees from level 1 to n.

Electrostatics-Based Alignment

These potentials were then analyzed by structural alignment using CE and comparison of potentials using a variety of norms, including the Carbo and Hodgkin similarity indices. These pairwise measures were then used to cluster the electrostatic data into similar subsets using a simple method UPGMA.