Comet hunting with Python

Hunt comets with Python and open data

Search for celestial bodies from the comfort of your own home with the SOHO satellite and the power of Python.


ould you like to discover a comet? Of course you would. But perhaps the thought of staring into the void with giant binoculars or a telescope, night after freezing night, for years on end, to find just one, tiny smudge might be less appealing. How about discovering a comet while sitting in a warm room wearing only your underwear, or better still, getting your computer to do it?

It may surprise you, but we cannot predict when comets will appear in our skies. Halley’s Comet, and a few others, are exceptions to the rule. Most comets are spotted by chance as faint specks moving through the stars, and that’s what we’ll be looking for using a proven source of images: the LASCO instrument on the SOHO satellite (SOlar and Heliospheric Observatory). Its image data is released under public domain, as with almost all NASA data, and although it’s only looking at a few degrees of the sky around the Sun, this is a good place to find comets, as explained in the Sungrazers boxout, right. LASCO actually has several cameras, but we’ll be using its C3 camera, as its smaller field of view makes it easier to work with.

In a typical LASCO image, there’s a circle in the centre representing the disk of the Sun (called the photosphere in astronomers’ lingo) but that’s deliberately blotted out by a larger disk so we can see fainter objects around the Sun. The fuzzy stuff is the corona, the outer atmosphere of the Sun and the start of the solar wind – LASCO’s main purpose is to study that. The SOHO spacecraft is in orbit around the Sun, and LASCO keeps it in the centre of its view, which means that stars, planets and comets will all be seen moving across the image.

My God… it’s full of stars


A view from SOHO’s LASCO C3 camera that shows many stars, including the Pleiades star cluster (1) along with four planets, which are overexposed with horizontal lines running through them. From left to right: Mercury (2), Saturn (3), Jupiter (4) and Venus (5). Also, the Sun is blowing off a Coronal Mass Ejection (CME) to the top left (6). Most of the blobs are not stars or planets or comets, but are in fact cosmic rays striking the detector.

Spot the difference

Finding a comet does not involve frightening physics – it’s more like a game of spot the difference using many images. It’s tricky because there are lots of objects that can be confused with a comet.

The easiest objects to rule out are planets. Mercury, Venus, Mars, Jupiter and Saturn are all very bright and so easy to spot, as shown in the blue LASCO C3 image (left). Uranus and Neptune and a host of other objects such as Pluto and asteroids are much fainter, but they too can be ruled out because we know where they are going to be at any time. The Earth doesn’t make an appearance in SOHO images because it is always behind the satellite.

Stars can be easily identified because their movement over time is predictable: they march across the image in formation from left to right, at about three pixels per hour in LASCO C3. Comets usually move diagonally, and at a different rates.

So once stars and planets are ruled out, anything that’s left must be a comet, yes? Unfortunately not. There are many comet-like smudges on all SOHO images that are caused by cosmic rays. These are high energy particles from anywhere in the cosmos that strike the detector and fool it into thinking that light has been detected. Fortunately, these are easy to rule out because they only affect one image. If the smudge is present in one image, but completely gone in the next, then it’s a cosmic ray.

Before automating any task, it’s informative to try it manually. Thankfully that’s easy to do here because some test examples are available at the sungrazer comet page at the US Naval Research Lab (yes, the US military let their staff research comets… but why is a long story!). If you go to and scroll down you’ll find a section called Strategy And Tips and in that is a list of Zip files that you can download so you can hone your comet-hunting skills. Inside each Zip file you will find a series of LASCO images, and a cheat-sheet telling you where the comet is (you’re not going to peek first, are you?) Download the Zip file and open up the first image in the series using your image viewer. Click on the Next button (the default image viewers in Ubuntu/Unity and Slackware/KDE both have one) and look at the sequence of images. Unless you have the visual acuity of Robocop, you will not see a comet, but instead gain an appreciation for how difficult it can be to find one, even when you know it’s there!

Comets and sungrazers

Comets are often described as dirty snowballs. They are lumps of loosely bound ice, rock and dust, left over from the formation of the Solar System. Most of them hang around in what’s called the Oort cloud, which is well beyond the orbit of all the Sun’s planets. Once in a while, something disturbs the cloud and a comet is sent into the inner Solar System, and then we might see it.

Some comets, called sungrazers, pass very close to the Sun, which has a surface temperature of about 5500°C and is chucking out energy in the form of electromagnetic radiation (ie light) with a power of about 3.8×1026W (yes, that W means watts, the same unit used for lightbulbs!) Each square metre of the solar surface emits energy at a rate equal to 62,000,000 W – think 62,000 bars of an electric fire. Even if these numbers boggle your mind, I’m sure it’s clear that this is going to cause a problem for an icy object like a comet. In fact, many comets don’t survive a close encounter with the Sun. In December 2013, Comet ISON looked promising, but it perished in the intense solar radiation. Other sungrazers fare better, but are much disrupted, such as comet Lovejoy in 2011, pictured. Luckier ones will be fragmented into many small pieces, and each one will become a comet in its own right. It’s thought that a big comet broke up back in the year 1106 AD and fragments of that have provided us with many great sungrazing comets over the centuries. This group is called the Kreutz sungrazers, and 85% of comets found by SOHO are in this group.


Kreutz sungrazer comet Lovejoy only just survived its close encounter with the Sun in late 2011.

Manual experience

Take a deep breath. Pour yourself a relevant beverage (I like coffee or Raspberry Pi-brewed beer) and read the instructions on the sungrazer page more carefully. There’s one important clue that will narrow down your search: most comets approach the Sun from a particular direction that depends on time of year. Have a look at this page to get an idea of where to look and when: Even with this information, you might still find yourself tearing your hair out, because some comets are very faint. Try the example named soho1264, because that comet is relatively bright. If you flick back and forth between the images taken at 1718 and 1742, you should be able to see the comet in the bottom-left corner moving towards the centre of image. (Did you have to peek in the cheat-sheet? It’s OK, I did too first time round.)

You should now be able to appreciate our plan: 1) load a pair of images; 2) difference them; 3) clean the differenced image; 4) identify objects; 5) repeat and track objects in subsequent images. We’ll concentrate on 1–4, because if these are done right, step 5 is relatively easy.

This image shows the difference between images taken at 17.18 and 17.42 on 5th Feb 2007 by SOHO LASCO/C3. Red blobs show features present at 17.42 but not at 17.18 and vice versa for blue blobs. The broad line in the top-right of the image is the pylon holding the central coronagraph disk in place. The inset shows the area around the comet.

This image shows the difference between images taken at 17.18 and 17.42 on 5th Feb 2007 by SOHO LASCO/C3. Red blobs show features present at 17.42 but not at 17.18 and vice versa for blue blobs. The broad line in the top-right of the image is the pylon holding the central coronagraph disk in place. The inset shows the area around the comet.

Automating with numpy, scipy and matplotlib

First, install the new Python modules we’ll need. On Debian-based distros:

sudo apt-get install python-numpy python-scipy python-matplotlib

Numpy is a numeric library for Python that provides lots of new ways to work with arrays. Scipy is a library that performs all kinds of science-related data processing, and Matplotlib will make short work of displaying the images. We’re going to use numpy and Scipy to load up an image file and turn it into a 2D array of numbers. You can put the following commands in a file called, save it and enter python on the command line, or you can just enter python on the command line and type them in line by line. First, we’ll load up the first image of the soho1264 that shows the comet:

import scipy
image1=scipy.misc.imread(‘full_soho1264_070205_1718.gif’, flatten=1)
import matplotlib.pyplot as plt


You should now see a LASCO C3 image in a Matplotlib window. We’ve loaded the image using imread and flattened it, which means each pixel becomes a brightness value with no colour information. Each value will be a float between 0.0 and 255.0 inclusive and is stored in the Numpy array called image,1 which has dimensions 1024 by 1024. We then display the image with the ‘gray’ colour map.

Comet Lovejoy (officially C 2011 W3) nearing the Sun, as seen by SOHO LASCO’s C2 camera.

Comet Lovejoy (officially C 2011 W3) nearing the Sun, as seen by SOHO LASCO’s C2 camera.

Next, we’ll take a difference of two images. Close the first image window and enter the following in the same interactive Python session (or into your .py file):

image2=scipy.misc.imread(‘full_soho1264_070205_1742.gif’, flatten=1)
import numpy as np

We’ve loaded the image taken 24 minutes later at 17.42, then used Numpy’s subtract function, which takes each pixel in the second image and subtracts the value of the pixel at the same co-ordinates in the first image and returns the result to a new array we call diff. We then display diff using the colour map bwr, which stands for blue-white-red. This means that features that only appear in the second image show as red; features that only in the first image show as blue; and areas of no difference are white.

If you look closely at the difference image, you’ll see that there are many isolated blue or red blobs that correspond to cosmic ray artefacts only present in one or other image. In a few places there is a red spot immediately to the right of a blue spot – these are stars. If you look very carefully at the bottom-left of the image, and if your monitor is very clean, you might just see the comet: a faint red smudge above and to the right of the a similar blue smudge. The fact that this smudge is moving diagonally across the image towards the Sun is strongly suggestive of a comet, but based on two images alone we can’t be sure that it’s not just a happy coincidence of cosmic rays.

Scipy’s label function


Left: A 4 by 4 image, in which three pixels (shown in cyan) have the same value, is given to label to be scanned with a 3 by 3 grid. Right: No 3 by 3 grid can be drawn containing the top left two pixels and the one at bottom right, so label will return a 4 by 4 array labelling them as two separate blobs, here labelled as 5 and 6.

Clean and identify

Starting with the diff image we obtained above, we’ll now produce a cleaned image containing only objects that showed up blue:

xt=np.where(x<-50, x,0)
d1=np.where(xt==0, xt,-1)

First we convert the diff array to type int and select a 100 by 100 square in the lower-left corner. This may seem like a cheat, but the sungrazer site tells us that’s where a Kreutz sungrazer would enter the image in February. On the next line we use Numpy’s where command to set all pixels that are greater than -50 in value to zero. It works by testing each pixel for the condition specified in the first argument, x<-50: if true, the second argument is used to fill the value in new pixel array, and if not, the third argument is used. The resulting array will only contain strong blue blobs, that is, features prominent in image1 but not image2. We then use the where command again to set all remaining non-zero pixels to -1, which will make identifying the blobs much easier. We are being rather brutal here and throwing away a lot of data, eg assuming pixels between -50 and 0 are uninteresting noise, but we can fine-tune parameters later if we suspect we’re missing comets.

We now have an image d1 in which each pixel is either 0 or -1. Next, we use Scipy’s cunning label function to identify all blue blobs, which are just groups of pixels with value -1:

from scipy.ndimage import label
l1, n1 = label(d1, scipy.ones((3,3)))

There’s a lot going on in that second line. We give the label function the cleaned differenced image d1 and also scipy.ones((3,3)), which is a 3 by 3 array in which all elements are 1. This is asking label to look at all possible 3 by 3 grids within the image, and if it finds two pixels with the same non-zero value inside a 3 by 3 grid, it assigns them to the same blob.

Next, we repeat all of the above to label red blobs, except with a threshold of +50:

xt=np.where(x>50, x,0)
d2=np.where(xt==0, xt,1)
l2, n2 = label(d2, scipy.ones((3,3)))

The end result is that n1=11 (11 blue blobs) and n2=15 (15 red blobs). The l1 array is a 100 by 100 array in which each element is zero (nothing there), or is a number between 1 and 11 indicating which blue blob that pixel belongs to, with the l2 array being similar except that it contains 15 blobs.

Great success

We’ve now narrowed down our search from many thousands of blobs to about a dozen. That’s pretty good going!

It’s worth visualising our cleaned difference images to appreciate how good (or brutal) our clean-up has been. To do this, add together the cleaned red and blue images with imshow(d1+d2) and use the bwr colour map, as described above. You should be able to see a few pairs of red and blue blobs that are stars, and another pair moving diagonally – our comet!

We now need to pair red and blue blobs that are within a certain radius of each other. The sungrazer website says that Kreutz group comets typically move less than 10 pixels per hour in C3 images, and we know that stars move even more slowly than that, so let’s set our search radius to a little more than that, at 15 pixels per hour. There’s a 24-minute time difference between our two images, so our search radius will be (24/60)*15=6. Next we’re going to look at all pairs of blobs and see which red and blue blobs are within our search radius:

import scipy.ndimage.measurements
for c1 in centres1:
   for c2 in centres2:
      if (c1[0]-c2[0])**2 + (c1[1]-c2[1])**2 < 6*6:
print len(pairs)

This code uses Scipy’s center_of_mass function to calculate the centres of all the blobs. Then it loops through all possible pairs and if two blobs are within a circle of radius 6 pixels they’re appended to the pairs list. The result is that there are 10 pairs.

To investigate further we’d need to repeat the above procedure for the next two images in the sequence, generating a new list of pairs. Since our new image1 is just our old image2, we can expect the new blue blobs to have the same centres as our old red blobs. In this way, we can match up new and old pairs and track objects as they move from image to image. After we’ve tracked them over several images, all cosmic ray coincidences should be ruled out and we’ll be left with tracks of stars and, hopefully, comets.

With just a few more lines of code it’s possible to produce the tracks shown based on seven images from 17.18 to 20.42. The comet is now pretty obvious because of its diagonal motion. The code we’ve outlined above could do with a lot of refining because it’s probably doing too good a job of rejecting false positives, to the point where it might be missing real comets. The best way to improve it is to try it out on other image sequences with known comets in them and experiment with some choices we’ve made, such as the noise threshold of 50, the 3 by 3 label search grid and the 100 by 100 sub-image.

Tracks of objects for LASCO C3 images on 5 Feb 2007. Dots shown show positions starting at 17.18 (light red) and ending at 20.42 (white). The time intervals vary, eg there’s an hour between the fourth and fifth image. The comet is moving diagonally, and stars horizontally.

Tracks of objects for LASCO C3 images on 5 Feb 2007. Dots shown show positions starting at 17.18 (light red) and ending at 20.42 (white). The time intervals vary, eg there’s an hour between the fourth and fifth image. The comet is moving diagonally, and stars horizontally.

Go discover comets, and more…

You can download SOHO data from here for any time period, including near real-time images. Images are now provided as JPEG files rather than GIFs, but all the code above will still work. If you do think you’ve spotted a comet, read the instructions on the sungrazer comets page on how to report it. In the same way that a well-constructed bug report is more likely to get attention from a developer, professional scientists are more likely to accept your discovery if it’s presented to them in a way that shows you know your stuff.

Don’t stop at comets; you can apply the principles introduced here to look in other data sets, to hunt for asteroids or sunspots, for example. You could also analyse satellite images of the Earth’s surface, or even turn your attention to medical images. The human race is drowning in data, especially image data, and so there’s every chance that, with a bit of hard work, you could make a real contribution to research by honing and applying basic image processing skills.


Add a Comment

Your email address will not be published. Required fields are marked *