Creating Custom Data Classes¶
SNData uses an object oriented design to automate basic data management tasks and ensure a consistent user interface. Prebuilt template classes are provided to represent spectroscopic / photometric data, allowing users to create customized data access objects. In general, the steps for creating a new data access class include:
- Inherit from one of the prebuilt
SpectroscopicRelease
orPhotometricRelease
template classes - Define some meta data describing the data release
- Define a few private functions for parsing your data set
Below we work through examples for photometric and spectroscopic data. If after working through these examples you still have remaining questions, please feel free to raise an issue on GitHub.
Spectroscopic Data¶
To create a data access class for spectroscopic data, we start by inheriting
from the SpectroscopicRelease
. We then add some meta data about the
survey and data release that is being represented.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | from sndata.base_classes import SpectroscopicRelease
from sndata import utils
# Class should be named to reflect the official acronym of the data release
# or, if an acronym is not available, the publication for the data
class Smith20(SpectroscopicRelease):
"""Describe the contents of the data release here
(Source: Smith, Jhonson et al. 2020)
Deviations from the standard UI:
- None
Cuts on returned data:
- None
"""
# General metadata
survey_name = 'Demo Supernova Survey' # Full survey name
survey_abbrev = 'DSS' # Official survey abbreviation
release = 'Smith20' # Name of the data release
survey_url = 'www.url.com' # URL of the official data release
publications = ('Smith et al. 2020',) # Any relevant publications
# Link to ADS abstract of primary paper
ads_url = 'https://ui.adsabs.harvard.edu/'
|
Next we use the __init__
method to define the local paths of the data
being represented and, if desired, the remote URL’s the data should be
downloaded from. All data should be stored in a subdirectory of the
self._data_dir
, which is determined automatically by SNData.
1 2 3 4 5 6 7 8 9 10 11 12 | def __init__(self):
"""Define local and remote paths of data"""
# Call to parent class defines the self._data_dir attribute
# All data should be downloaded to / read from that directory
super().__init__()
# Local data paths. You will likely have multiple directories here
self._spectra_dir = self._data_dir / 'spectra_dir_name'
# Define urls for remote data.
self._spectra_url = 'www.my_supernova_spectra.com'
|
The logic for parsing individual data files is then added as private methods:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | def _get_available_tables(self) -> List[int]:
"""Get Ids for available vizier tables published by this data release"""
# Find available tables - for example:
# Find available tables - assume standard Vizier naming scheme
table_nums = []
for f in self._table_dir.rglob('table*.dat'):
table_number = f.stem.lstrip('table')
table_nums.append(int(table_number))
return sorted(table_nums)
def _load_table(self, table_id: int) -> Table:
"""Return a Vizier table published by this data release
Args:
table_id: The published table number or table name
"""
readme_path = self._table_dir / 'ReadMe'
table_path = self._table_dir / f'table{table_id}.dat'
# Read data from file and add meta data from the readme
data = ascii.read(str(table_path), format='cds', readme=str(readme_path))
description = utils.read_vizier_table_descriptions(readme_path)[table_id]
data.meta['description'] = description
return data
def _get_available_ids(self) -> List[str]:
"""Return a list of target object IDs for the current survey"""
# Returned object Ids should be sorted and unique.
# For example:
files = self._spectra_dir.glob('*.dat')
return sorted(set(Path(f).name for f in files))
def _get_data_for_id(self, obj_id: str, format_table: bool = True) -> Table:
"""Returns data for a given object ID
Args:
obj_id: The ID of the desired object
format_table: Format for use with ``sncosmo`` (Default: True)
Returns:
An astropy table of data for the given ID
"""
# Read in data for the object ID and return an astropy table
pass
def _download_module_data(self, force: bool = False, timeout: float = 15):
"""Download data for the current survey / data release
Args:
force: Re-Download locally available data
timeout: Seconds before timeout for individual files/archives
"""
# If you do not wish to include download functionality,
# do not include this method
# The ``utils`` module includes functions for downloading files
# See the ``utils.download_tar`` and ``download_tar.download_file``
# functions. Here is an example:
utils.download_tar(
url=self._spectra_url,
out_dir=self._data_dir,
skip_exists=self._spectra_dir,
mode='r:gz',
force=force,
timeout=timeout
)
|
Notice that there is no need to explicitly raise errors for invalid object
Id’s (InvalidObjId
) or handle errors where there is no downloaded
data (NoDownloadedData
). This is handled automatically by the
SpectroscopicRelease
class we inherited from.
Note
The formatting of Vizier tables is fairly standard, and there is a
chance your _get_available_tables
and _load_table
methods be exactly
the same as the above example. Instead of copy and pasting these two
methods from above, you can alternatively inherit the
DefaultDataParser
class.
Photometric Data¶
Photometric data is represented the same way as spectroscopic data, but with
a few differences. The first is to inherit from the PhotometricRelease
class and to include a bit of extra meta data.
1 2 3 4 5 6 7 8 | from sndata.base_classes import PhotometricRelease
class Smith20(PhotometricRelease):
# Include all of the meta data for a spectroscopic data release and also
# Specify the name and zero point of each photometric filter
band_names = tuple('u', 'g', 'r', 'i', 'z')
zero_point = tuple(25, 25, 25, 25, 25)
|
If not already using transmission filters built into sncosmo
, you will
also need to define the directory where the filter transmission curves are
stored and add some small logic for registering those filters with sncosmo.
For example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def __init__(self):
"""Define local and remote paths of data"""
# Call to parent class defines the self._data_dir attribute
# All data should be downloaded to / read from that directory
super().__init__()
# Local paths of filter transmission curves
self._filter_dir = self.data_dir / 'filters'
def _register_filters(self, force: bool = False):
"""Register filters for this survey / data release with SNCosmo
Args:
force: Re-register a band if already registered
"""
bandpass_data = zip(self._filter_file_names, self.band_names)
for _file_name, _band_name in bandpass_data:
filter_path = self._filter_dir / _file_name
wave, transmission = np.genfromtxt(filter_path).T
band = sncosmo.Bandpass(wave, transmission)
band.name = filter_name
sncosmo.register(band, force=force)
|
Note
The _register_filters
method shown above can also be inherited
from the DefaultDataParser
class.