Enhanced XMM-Newton Spectral-fit Database

The project


The discovery of the X-ray background in 1962 by Giaccconi and collaborators boosted an immense program of exploration of the X-ray sky. X-ray Astronomy has been tightly linked to the physics of compact obejcts such as AGN, and as a result, to Observational Cosmology. Serendipitous X-ray surveys exploiting data from individual pointed observations played as a key role in the development of X-ray surveys as major astrophysical tools. The Chandra (NASA) and XMM-Newton (ESA) missions launched about ten years ago have revolutionised our knowledge of the X-ray sky. In particular, the XMM-Newton observatory provides unparalleled capabilities for serendipitous X-ray surveys because of its a) large field-of-view b) the high throughput of its telescope c) its good Point-Spread-Function (6 arcsec FWHM on-axis), and d) broad bandpass (0.3-12 keV), which guarantees the detection of many hard and obscured sources. The XMM-Newton mission has been extremely successful so far with a huge impact on the wider astronomical community: about 300 refereed publications appear every year on international journals. A key role in the success of the XMM-Newton mission has been played by the XMM/SSC serendipitous survey project. This large-scale international program allowed the dissemination of the XMM legacy to the wider astronomical community.

The XMM-Newton Serendipitous Source Catalogue

The XMM-Newton serendipitous source catalogues are produced by the XMM-Newton Survey Science Center (SSC), an international consortium of ten European institutions, led by the University of Leicester. The SSC activities are performed on behalf of ESA. The catalogues are produced on the basis of the EPIC detectors source lists (no RGS data are included). These source lists are produced by the scientific pipeline used by the SSC for the processing of all XMM-Newton data. The current release is 3XMM-DR6 which has been created from 9160 XMM-Newton pointed observations over a 14 year interval since launch in 1999. The current version of the catalogue, the largest ever produced in X-ray wavelengths contains ∼680,000 detections covering an area of 982 sq. degrees. The 3XMM serendipitous catalogue, owing to the very large area covered, is complementary to dedicated XMM-Newton surveys which provide coverage of much smaller areas, albeit at a deeper flux limit. It is important to note that the catalogue reaches a flux limit where the dominant contribution to the X-ray background is produced. In particular, the minumum detectable flux in the 0.5-2 keV band is 2x10-15 erg cm-2 s-1 at 10% sky coverage.

The catalogue provides a wealth of information that is mainly related to the imaging analysis of the PN and MOS images. For example, the catalogue contains

  1. Photometry in various energy bands.
  2. Information on the source extent.
  3. X-ray hardness (colour) distributions.
The catalogue also attempts to deal with the handling of spectral information. The pipeline has been configured to automatically extract spectra for the brighter detections. The spectral extraction have been optimize in the latest version of the XMM-Newton catalogue (for a more detailed description see 3XMM-DR6 User Guide):
  1. The extraction of data for the source takes place from an aperture whose radius is chosen to maximise the signal-to-noise (S/N) of the source data.
  2. Background spectral extraction has been modified by allowing the process to search the images for a background region with at least 70% of usable area.
  3. Source spectra are extracted if source counts > 100 total-band EPIC counts.
There are about 150,000 detections satisfying the above criteria. For each source meeting the extraction criteria, the pipeline created the following spectrum-related products: a) source+background spectral file (grouped to a minimum of 20 cts/spectral bin) and a corresponding XSPEC generated spectral plot; b) a background spectrum; and c) an auxiliary response file. The publicly available response matrices (RMF files) are given in a header keyword. However, no spectral fits have been yet implemented. This implies that the current XMM database cannot be queried using spectral parameters.


The goals of our previous PRODEX project (XMMFITCAT) were to develop the necessary software tools and to provide spectral fits for all sources in the XMM-Newton catalogue which have derived spectral-related products (see above). Different basic models were applied, by using Cash statistics, to individual spectra and the goodness of each fit was estimated. The derived spectral parameters, fluxes and goodnes of fit are listed in the final catalogue. Besides, the spectral-fitting results are incorporated in the LEDAS database, providing query capabilities on the basis of spectral properties.


The huge scientific potential of XMMFITCAT remains untapped because most of its sources lack distance information (redshifts). In the enhanced XMMFITCAT, beside expanding the database to the new versions of 3XMM, we are deriving photometric redshifts for the sources of the 3XMM catalogue (XMMPZCAT) and constructing a new database of spectral-fitting results using these photometric redshifts (XMMFITCAT-Z). With this information now we can derive important properties of the X-ray sources like X-ray luminosity, temperature of the hot emitting plasmas, rest-frame obscuring column densities and rest-frame energies of emission and absorption features.

To this end we have combined the output of the XMMFITCAT and ARCHES projects. ARCHES is a EU FP7 funded program whose goal is to enhance the 3XMM catalogue by providing cross-matches with several multi-wavelength catalogues, and to construct Spectral Energy Distributions (SEDs) based on reliable photometry. Using the cross-matched catalogues, tools and techniques developed in ARCHES, and combining the data from surveys like SDSS and Pan-STARRS, we can find reliable optical counterparts for ∼100,000 X-ray sources in the 3XMM.

We are using this set of multi-wavelength data to derive photometric redshifts using machine learning techniques. In particular we are employing MLZ-TPZ, an algorithm based on a supervised technique with prediction trees and random forest.