Enhanced XMM-Newton Spectral-fit Database

3XMM photo-z catalogue (XMMPZCAT)


  1. Identification of counterparts
  2. Photo-zs with machine learning algorithms
    1. Training sample
    2. Quality of photo-zs
    3. Separating stars
  3. Description of the columns

1. Identification of counterparts

Since optical photometry is needed to derive photometric redshifts, we have to identify the corresponding optical counterparts for the X-ray sources in the 3XMM. The addition of near- and mid-infrared data increase the accuracy of the photometric redshift, so we also look for counterparts in these wavelengths. To this end we use the multi-wavelength cross-matched catalogues of the ARCHES project.

The base X-ray catalogue used by ARCHES was the 3XMMe (the enhanced 3XMM catalogue). The 3XMMe catalogue is a reduced version of the 3XMM catalogue but composed of X-ray sources from XMM-Newton observations with the highest quality. This catalogue was cross-matched with several catalogues in different energy ranges: GALEX-DR5, UCAC4, SDSS-DR10, 2MASS, UKIDSS-LAS, AllWISE, SUMMS NVSS and AKARI-FIS. The result consist in two catalogues, one with all-sky covering and including 2MASS, and the other including UKIDSS and with covering restricted to the sky area of this survey. The cross-matching tool used in ARCHES (xmatch) computes probabilities of association for all possible sets of candidates in the cross-matched catalogues.

Since we are only interested in finding counterparts in the optical (SDSS), near-infrared (2MASS or UKIDSS), and mid-infrared (WISE), we needed to derive the probabilities of these associations in each case as a function of the ones provided in the ARCHES catalogues. We then imposed a lower limit of 68\% in the probabilities of association and selected among all the available counterparts, the maximum number of them with probability of association >68%. In the case of multiple counterparts in the same catalogue for the same X-ray source with probabilities >68%, we selected the ones with the larger number of possible counterparts.

Applying this method, we finally obtained a multi-wavelength catalogue composed of 42,705 X-ray sources with SDSS counterparts, 14,805 of them with near-infrared counterparts (UKIDSS or 2MASS), and 26,926 of them with WISE counterparts.

2. Photo-zs with machine learning algorithms

The photometric redshifts of XMMPZCAT were estimated using MLZ-TPZ (Carrasco Kind & Brunner, 2013), a machine learning algorithm based on a supervised technique with prediction trees and random forest. This is a parallelizable python software that calculates fast and robust photometric redshifts and their corresponding probability density distributions (PDF).

2.1. Training sample

One of the key aspects of estimating photometric redshifts using supervised machine learning methods is the selection of an adequate training sample. This sample should be representative of the global sample for which photo-zs will be calculated. In our case we assembled a sample of X-ray detected sources with spectroscopic redshifts and SDSS photometric data (plus 2MASS, UKIDSS and/or AllWISE data for some of them). We selected sources from XXL, XWAS, COSMOS, XMS and XBS, all of them X-rays surveys with a high level of spectroscopic identification. In addition, we included 1500 sources spectroscopically identified as QSO in the SDSS-DR13 with X-ray counterparts. Our final training sample is composed of 5157 objects with SDSS photometric data, 3129 with also near-infrared data (UKIDSS or 2MASS) and 4718 with with mid-infrared data (AllWISE).

2.2. Quality of photo-zs

One of the main problems of photometric redshifts is estimating their accuracy and reliability. In the case of machine learning techniques, we can obtain a fine estimate of the method's performance, in the statistical sense, through tests using the corresponding training sample.

In order to evaluate the accuracy of our photo-z derivation, we make use of the most widely used statistical indicators, which are the following:

where σNMAD is the normalized median absolute deviation and eta is the percentage of catastrophic outliers. A source is considered a catastrophic outlier if |Δ(znorm)| > 0.15

We built 8 different training+testing sets by dividing our training sample according to the sources extension in the optical, and the amount of photometric data available: only optical (5 filters: ugriz), only optical+NIR (8 filters: ugrizJHK), only optical+MIR (7 filters: ugrizw1w2), and optical+NIR+MIR (10 filters: ugrizJHKw1w2). The results are presented in the following table:

extended point-like
# filters σNMAD eta (%) σNMAD eta (%)
5 0.071 18 0.076 29
7 0.057 14 0.064 19
8 0.054 12 0.057 20
10 0.046 9 0.049 14

Since TPZ gives the full PDF for the photometric redshifts, we can obtain more information on the reliability of the derived redshift for each particular source. An unimodal PDF, narrowly concentrated around its maximum is a sign of a reliable redshift estimate, while a multimodal PDF with several local maxima of similar height is a clear sign that the redshift is badly determined.

We calculated several parameters derived from the PDFs and we tested how a selection of sources based on these parameters affects the performance of our photo-z derivation.

We are still doing extensive quantitative tests to find the optimal values of these parameters that give a good compromise between the accuracy of the photometric redshifts and the number of lost sources. Preliminary results show that a selection of objects with zConf>0.7 (Peak strength>0.9) can reduce the number of catastrophic outliers in almost a factor 10 (5), but by losing two thirds (half) of the sample. Figure 1 shows the spectroscopic redshifts on SDSS-DR13 versus the ones obtained by using TPZ, before an after applying a cut in zConf. The plots clearly show that the accuracy and fraction of outliers is reduced significantly after the cut.

Figure 1. Spectroscopic redshifts versus photometric redshifts for the 3XMM photo-z catalogue (no stars included). Blue dots are extended sources and red crosses are point-like sources. Grey dashed lines show the outlier limits. Left: all sources. Right: sources with zConf>0.7.

2.3. Separating stars

At the average X-ray flux levels of the 3XMM catalogue and high galactic latitudes, the expected percentage of X-ray emitting stars is small, below 10%, but not negligible. TPZ can be used also to classify sources, and it has in fact been used before to separate optical stars from quasars with an extremely high efficiency by using SDSS and WISE photometry.

Stars are easily separated from galaxies and QSO using a combination of optical and IR colours. We can use objects in our catalogue having IR photometry to identify X-ray emitting stars. With these objects we can build a training sample for TPZ and identify stars for sources with only optical colours available. There are 13,781 point-like sources in our sample with the needed optical and IR colours. Figure 2 shows two colour-colour plots. We classified the sources below the red dashed lines as stars (green triangles) and the remaining sources as no stars (grey circles). Applying these criteria we build a training sample of 2651 stars and 11,130 no stars.

Figure 2. Colour-colour plots of point-like Pan-STARRS sources showing our criteria for identification of stars. Left: g-z versus z-W1. Right: g-z versus J-K. Green triangles are sources identified as stars according to their MIR or NIR colour, i.e. objects below the red dashed line in the left or the right plot. Grey circles are sources classified as no stars.

Using these training sample TPZ identified 4112 objects as stars, 1572 of them previously unidentified through the IR-colour selection. Figure 3 shows an optical colour-colour diagram for point-like sources in our sample. The plot clearly shows that our method is able to identify the typical star tail of this kind of diagrams (magenta open circles).

Figure 3. r-i versus g-z colour-colour plot of point-like sources. Magenta open circles are sources classified as stars using TPZ. Remaining symbols as in Fig. XX.

3. Description of the columns

The photometric redshift catalogue consists of a FITS table with one row for each unique X-ray source, and 17 columns containing the estimated redshift plus additional information about the X-ray source, the optical counterpart and several parameters that can help assessing the reliability of the derived photometric redshift. Not available values are represented by a "null" value.

xmm_SRCID: Source identification label from 3XMM-DR6 catalog.

XMM RA, XMM DEC: X-ray source coordinates as in 3XMM-DR6.

opt_SRCID: Source identification number in SDSS-DR10.

XMMFITCAT: Source included in the XMM-Newton spectral-fit database.

Nfilters: number of magnitudes used in the photometric redshift estimation.

extended: True if the source is classified as extended in SDSS-DR10.

ph_flag: quality of the photometric data. XYZ, where X is the flag for optical data, Y for WISE data and Z for NIR data (2MASS or UKIDSS). The possible values for X/Y/Z are:

inTCS (in Training Colour Space): all colours used to calculate the photometric redshift are inside the colour space covered by the training sample.

STARS: True if the source was identified as a star.

SPEC Z: Spectroscopic redshift in SDSS-DR13.

PHOT_Z: photometric redshift estimated by TPZ.

PHOT_ZERR: one-sigma error of the photometric redshift.

PHOT_ZCONF: confidence of the photometric redshift.

Npeaks: number of local maxima in the PDF.

PS (Peak Strength): 1-P2/P1, where P1 is the probability density of the highest local maximum, and P2 is the second maximum peak.

PHOT_Z2: redshift position of P2.