## Miracles when you use the right metric

I recommend reading, carefully and thoughtfully, the preprint “The Metric Space of Collider Events” by Patrick Komiske, Eric Metodiev, and Jesse Thaler (arXiv:1902.02346). There is a lot here, perhaps somewhat cryptically presented, but much of it is exciting.

First, you have to understand what the Earth Mover’s Distance (EMD) is. This is easier to understand than the Wasserstein Metric of which it is a special case. The EMD is a measure of how different two pdfs (probability density functions) are and it is rather different than the usual chi-squared or mean integrated squared error because it emphasizes separation rather than overlap. The idea is look at how much work you have to do to reconstruct one pdf from another, where “reconstruct” means transporting a portion of the first pdf a given distance. You keep track of the “work” you do, which means the amount of area (i.e.,”energy” or “mass”) you transport and how far you transport it. The Wikipedia article aptly makes an analogy with suppliers delivering piles of stones to customers. The EMD is the smallest effort required.

The EMD is a rich concept because it allows you to carefully define what “distance” means. In the context of delivering stones, transporting them across a plain and up a mountain are not the same. In this sense, rotating a collision event about the beam axis should “cost” nothing – i.e, be irrelevant — while increasing the energy or transverse momentum should, because it is phenomenologically interesting.

The authors want to define a metric for LHC collision events with the notion that events that come from different processes would be well separated. This requires a definition of “distance” – hence the word “metric” in the title. You have to imagine taking one collision event consisting of individual particle or perhaps a set of hadronic jets, and transporting pieces of it in order to match some other event. If you have to transport the pieces a great distance, then the events are very different. The authors’ ansatz is a straight forward one, depending essentially on the angular distance θij/R plus a term than takes into account the difference in total energies of the two events. Note: the subscripts i and j refer to two elements from the two different events. The paper gives a very nice illustration for two top quark events (read and blue):

Transformation of one top quark event into another

The first thing that came to mind when I had grasped, with some effort, the suggested metric, was that this could be a great classification tool. And indeed it is. The authors show that a k-nearest neighbors algorithm (KNN), straight out of the box, equipped with their notion of distance, works nearly as well as very fancy machine learning techniques! It is crucial to note that there is no training here, no search for a global minimum of some very complicated objection function. You only have to evaluate the EMD, and in their case, this is not so hard. (Sometimes it is.) Here are the ROC curves:

ROC curves. The red curve is the KNN with this metric, and the other curves close by are fancy ML algorithms. The light blue curve is a simple cut on N-subjettiness observables, itself an important theoretical tool

I imagine that some optimization could be done to close the small gap with respect to the best performing algorithms, for example in improving on the KNN.

The next intriguing idea presented in this paper is the fractal dimension, or correlation dimension, dim(Q), associated with their metric. The interesting bit is how dim(Q) depends on the mass/energy scale Q, which can plausibly vary from a few GeV (the regime of hadronization) up to the mass of the top quark (173 GeV). The authors compare three different sets of jets from ordinary QCD production, from W bosons decaying hadronically, and from top quarks, because one expects the detailed structure to be distinctly different, at least if viewed with the right metric. And indeed, the variation of dim(Q) with Q is quite different:

dim(Q) as a function of Q for three sources of jets

(Note these jets all have essentially the same energy.) There are at least three take-away points. First, the dim(Q) is much higher for top jets than for W and QCD jets, and W is higher than QCD. This hierarchy reflects the relative complexity of the events, and hints at new discriminating possibilities. Second, they are more similar at low scales where the structure involves hadronication, and more different at high scales which should be dominated by the decay structure. This is born out by they decay products only curves. Finally, there is little difference in the curves based on particles or on partons, meaning that the result is somehow fundamental and not an artifact of hadronization itself. I find this very exciting.

The authors develop the correlation distance dim(Q) further. It is a fact that a pair of jets from W decays boosted to the same degree can be described by a single variable: the ratio of their energies. This can be mapped onto an annulus in a abstract dimensional space (see the paper for slightly more detail). The interesting step is to look at how the complexity of individual events, reflected in dim(Q), varies around the annulus:

Embedding of W jets and how dim(Q) varies around the annulus and inside it

The blue events to the lower left are simple, with just a single round dot (jet) in the center, while the red events in the upper right have two dots of nearly equal size. The events in the center are very messy, with many dots of several sizes. So morphology maps onto location in this kinematic plane.

A second illustration is provided, this time based on QCD jets of essentially the same energy. The jet masses will span a range determined by gluon radiation and the hadronization process. Jets at lower mass should be clean and simple while jets at high mass should show signs of structure. This is indeed the case, as nicely illustrated in this picture:

How complex jet substructure correlates with jet mass

This picture is so clear it is almost like a textbook illustration.

That’s it. (There is one additional topic involving infrared divergence, but since I do not understand it I won’t try to describe it here.) The paper is short with some startling results. I look forward to the authors developing these studies further, and for other researchers to think about them and apply them to real examples.