Skip to main content

Research

My research focuses on the question of inferring information from experimental data in the presence of error and quantization. While there are a number of potential applications to these ideas, I focus mostly on applications in single-molecule spectroscopy, using ideas from information theory to try to infer things about the dynamics of single-molecule experiments.

Papers
#

Compression algorithms reveal memory effects and static disorder in single-molecule trajectories
#

DOI: 10.1103/PhysRevResearch.5.L012026

Many times in single-molecule spectroscopy, we are interested in distinguishing between various classes of processes. For example, we may want to know whether the position of a molecule is undergoing diffusive dynamics, or is doing something more complicated. In this paper, we show that compression tools like LZMA and GZIP can reliably distinguish between Markov and non-Markov processes, as well as between processes which exhibit static disorder and dynamic disorder.

The intuition here is that as the Markov order of a process increases, we expect it to become more and more compressible, because more and more of the past state becomes involved with determining the present state. We can exploit this by constructing a Markov model of the process and attempting to compress it: if it compresses significantly worse than the original data, we can conclude that the process is not Markovian.

The effect of time resolution on the observed first passage times in diffusive dynamics
#

DOI: 10.1063/5.0142166

Suppose we are observing a particle undergoing 1D diffusive dynamics, and we would like to estimate how long it takes for the particle to exit a certain region. In principle, this is simple: we can just observe the particle until it exits the region, and then record how long it takes to do so.

However, in practice, we cannot observe the particle continuously: simulations have limited timesteps, and experimental devices like CCDs and cameras have limited temporal resolution. How much error can these limited timesteps introduce into our estimate?

A set of trajectories plotted against time, demonstrating the missed crossing effect.
Demonstrations of missed crossings in simulated trajectories. The black curve is the trajectory observed at large dt, and the gray is observed at small dt. On the left, the particle crosses the boundary approximately halfway through the simulation, but the coarse timestep does not observe this, leading to an overestimate of the exit time. On the right, the crossing is once again missed, but the particle diffuses all the way to the left boundary, making it appear as if the particle never exited the region at all.

Somewhat surprisingly, if we observe the particle at a timestep of \( \delta t\), the error in the estimated exit time can be as high as \(60 \delta t\). This is because the particle can exit the region and re-enter it in less than \(\delta t\), and then continue to diffuse in the interior. These events heavily skew the estimate of the exit time upwards.

In this paper, we show the origin of these events using simulations and propose a technique to probabilistically correct for them, reducing the error back to the order of \(\delta t\).

Non-Markov models of single-molecule dynamics from information-theoretical analysis of trajectories
#

DOI: 10.1063/5.0158930

Our earlier PRR paper on compression algorithms and Markovianity was limited to discrete-time processes. This is because lossless compression algorithms tend to operate on discrete data, and so one needs to discretize a continuous-time trajectory in order to use them. However, discretization with the naive approach of dividing space into bins and assigning the particle to the bin it is in at every time (what we refer to as “lumping”) can transform even diffusive trajectories into ones with highly non-Markovian signatures.

In this paper, we show that by adapting milestoning, a technique originally developed by Ron Elber for molecular dynamics simulations, to single-molecule trajectories, we can create a discretization which respects the (spatial) Markovianity of the original continuous-time process, allowing us to distinguish between Markovian and generalized Langevin processes, and perform an analysis on a Gly-Ser peptide which suggests that peptide folding is a non-Markov process.