Title: Autocoding Galaxy Spectra I: Architecture
Authors: P Melchior, Y Liang, C Hahn, et al.
Institution of first author: Department of Astrophysical Sciences, Princeton University, NJ, USA and Center for Statistics & Machine Learning, Princeton University, NJ, USA
Status: Submitted to AJ
Imagine for a moment that you’ve never seen a car before and you’re dying to build a toy car as a gift for a car-loving friend. Given this conundrum, you might understandably feel the outlook is bleak. Now suppose another friend comes by and has the bright idea to send you to the nearest busy street corner and explain that whatever comes by can be considered a car. Then they say goodbye and leave. Armed with this new information, you will be reinvigorated. Watching the passing vehicles for a few minutes will quickly give you an idea of what a car might be. After about an hour you go back inside, ready to face your challenge: you have to build your own (toy) car based only on what you saw.
This example is obviously made up, but provides a useful touchpoint from which we can heuristically understand how a autocoding, or more generally, any unsupervised machine learning algorithm, could work (see these Astrobites for examples of machine learning used in astronomy). If you think about how you would approach the above challenge, the rationale might be clear: with just the observations of the passing cars, you would stick to patterns you noticed and use them to “reconstruct” the definition of a . because in your head Perhaps you would see that most of them had four wheels, had many headlights, and were all generally the same shape. You could maybe build a decent looking toy car out of this and your friend would be proud of it. This is the basic idea behind an unsupervised learning task: the algorithm is presented with data and it tries to identify relevant features of that data in order to achieve a specific goal presented to it. In the specific case of an autoencoder, that goal is to learn how to reconstruct the original data from the compressed dataset, just as you tried to do by building a toy car from memory. In particular, you (or the computer) reduce the observations of the cars (the data) to common features (the so-called “latent features”) and reconstruct the car from those features (reconstruction of the data). This process is shown schematically in Figure 1.
Application to astronomy
To understand how this method of machine learning is applied in today’s work, let’s go one step further using the example of auto-reconstruction. Rather than being able to spot random cars driving by, let’s assume your friend was less genius this time and simply pulled up pictures of a few different cars on his phone. Based on that, you might be able to do a decent job producing something like this looks like a car, but your car probably wouldn’t work too well (e.g. maybe the wheels would be attached to the chassis and your car wouldn’t move). Alternatively, your friend may have wanted to challenge you further and simply described how a car works. In that case you could maybe build a decently functional toy car, but it probably wouldn’t look too accurate. These scenarios are roughly analogous to some of the challenges presented by current approaches to modeling galaxy spectra, the topic of today’s article.
The approaches to modeling galaxy spectra can be divided between empirical, data-driven models and theoretical models. The former corresponds to the images of cars your friend showed you – astronomers use “template” spectra and observations of local galaxies to construct model spectra that can be fitted to observations of higher redshift systems. While these are useful, they are usually based on observations of local galaxies and can therefore be confined to a limited range of wavelengths once the cosmological redshift correction is integrated. The theoretical models, on the other hand, reflect your friend’s latter suggestion; namely, they generate model spectra based on a physical understanding of emission and absorption in the interstellar medium, in stars and nebulae. These are interpretable and physically motivated, so can be applied to higher redshifts, for example, but usually rely on some approximations and are therefore unable to accurately capture the complexity of real spectra.
Despite these challenges, today’s authors note that the historical usefulness of using template spectra to describe new observations of other spectra implies that these data are not in themselves as complicated as it seems—perhaps the variations between spectra can be reduced to a few relevant parameters be reduced . This draws on the autoencoder discussion and inspires the approach of today’s work – perhaps one can find a low-dimensional embedding (read: simpler representation) of the spectra, making reconstruction an easy task.
How to create a galaxy spectrum
Most traditional galaxy spectrum analysis pipelines work by transforming the observed (redshifted) spectrum to the emitted spectrum in the galaxy’s rest frame, and then fitting the observation to a model. This means that the spectra are typically limited in the usable wavelength range to a range that is shared by all the different spectra in a test sample. In today’s authors’ architecture, they choose to keep the observed spectra, which means they do no redshift processing of any kind prior to their analysis, allowing them to present the algorithm with the full wavelength range of the observation, thereby preserving more of the data. Today’s article introduces this algorithm called SPENDER, which is shown schematically in Figure 2.
The algorithm takes an input spectrum and first passes it through three layers of convolution to reduce dimensionality. Since the spectra have not been red-shifted, the processed data is then passed through an attention layer. This is very similar to what you would do if you were watching the cars go by on the street – although there were many cars going by and they were all in different places and moving at different speeds, you focused on that warning on specific cars and specific characteristics of those cars to train your neural network (read: brain) to learn what a car is. This layer does the same thing by identifying which parts of the spectrum it should focus on; ie where the relevant emission and absorption lines may be. Then to finish the coding of the data, the data is passed through a multilayer perceptron (MLP) that redshifts the data into the galactic rest frame and compresses the data to s-dimensions (the desired dimensionality of latent space).
Now the model has to decode the embedded data and try to reproduce the original spectrum. It does this by passing the data through three “activation layers” that process the data through a preset function. These layers transform the simple, low-dimensional (latent) representation of the data into the spectrum in the galaxy’s rest frame. Finally, this plot is redshifted back to observation and the reconstruction process is complete.
In practice, the contributions of different parts of the data to the final result depend on initially unknown weights. To learn these weights is the model educated – the reconstructed and the original data are compared and the weights are optimized (roughly by trial and error) until the optimal set of weights is reached.
So how does it work?
The results of running the SPENDER model on a sample spectrum of a galaxy in the Sloan Digital Sky Survey are shown in Figure 3.
Visually, it seems that the model reproduces the given spectrum quite well. Figure 3 also shows one of the advantages of such a model. Not only is the model able to reproduce the different complexities of the spectrum, but by varying the resolution of the reconstructed spectrum, the model is also able to distinguish features that overlap (or mix) in the input data (see the two nearby OII lines in Figure 3, for example). Ultimately, the nature of the SPENDER construction means that data can be passed to the model as it is received by the instrument – since the model is trained without redshifting or cleaning, the model learns to incorporate this processing into its analysis. Such an architecture can also be used to generate spurious spectra and offers a new approach to detailed modeling of galaxy spectra that mitigates some of the problems that exist in current empirical galaxy modeling approaches.
Astrobite edited by Katy Proctor
Selected image credits: adapted from the newspaper and bizior (via FreeImages)
About Sahil Hegde
I’m a freshman astrophysics graduate student at UCLA. I am currently using semi-analytical models to study the formation of the first stars and galaxies in the universe. I received my BA from Columbia University and am originally from the San Francisco Bay Area. Outside of astronomy, I play tennis, surf (read: wipe out), and play board games/TTRPGs!
#Galaxy #Spectra #code #cracking