Combining Data Sets¶
SNData allows users to combine individual data releases into a single
CombinedDataset
object. The resulting object provides the same general user
interface as a single data access module but provides access to data from
multiple surveys / data releases.
Creating a Combined Data Set¶
To create a combined data set, import the data access classes for each of the
data releases you want to join and pass them to the CombinedDataset
object. For demonstration purposes we combine data from the third data
release of the Carnegie Supernova Project and the three year cosmology release
of the Dark Energy Survey:
1 2 3 | from sndata import CombinedDataset, csp, des
combined_data = CombinedDataset(csp.DR3(), des.SN3YR())
|
The resulting object provides the same user interface as the rest of the SNData package, including having the same method names:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # Download all data for the combined data releases
combined_data.download_module_data()
# Get a list of available supplementary tables
list_of_table_ids = combined_data.get_available_tables()
# Load a supplementary tables
demo_table_id = list_of_table_ids[0]
demo_sup_table = combined_data.load_table(demo_table_id)
# Get a list of available objects
list_of_ids = combined_data.get_available_ids()
# Get data for a single object
demo_id = list_of_ids[0]
data_table = combined_data.get_data_for_id(demo_id)
print(data_table)
# Iterate over data for all objects in the combined data set
for data in combined_data.iter_data():
print(data)
break
|
Important
The format of object and table Id’s for CombinedDataset
objects is slightly different than for a single data release. Please
keep reading.
Unlike the object and table Id’s for a single data release, the default Id’s
for a CombinedDataset
are tuples instead of strings. Each tuple contains
three elements including (in order) the individual object identifier, data
release name, and survey name. For example, the ID value for supernova ‘2007S’
from CSP Data Release 3 (DR3) would be ('2007S', 'DR3', 'CSP')
.
By specifying object Id’s in this way, it is ensured that objects in combined
data releases always have unique identifiers. However, in the case where
the object Id’s from two data releases are already unique (as is the case when
combining csp.DR3` and ``des.SN3YR
), CombinedDataset
objects are smart
enough to mimic the behavior of a normal / single data release and can
take object Id’s as strings. For example:
1 2 3 4 5 | # You can specify object ID's as tuples
combined_data.get_data_for_id(('2007S', 'DR3', 'CSP'))
# or if the object names across the joined surveys are unique, as a string
combined_data.get_data_for_id('2007S')
|
Joining Object Id’s¶
It is possible for two different photometric surveys to observe the same astronomical object. In this case, object Id’s from different surveys can be “joined” together so that when requesting data for a given object Id, data is returned for all Id’s that have been joined together. Accomplishing this is as simple as:
1 2 3 4 5 6 7 8 | # Note that you can join an arbitrary number of object Id's
combined_data.join_ids(obj_id_1, obj_id_2, obj_id_3, ...)
# You can also retrieve a list of joined ID values
print(combined_data.get_joined_ids())
# To undo the above joining action
combined_data.separate_ids(obj_id_1, obj_id_2, obj_id_3, ...)
|
When retrieving data for a joined ID, the returned data table is simply the collective data tables for each joined ID stacked vertically.
1 2 | data = combined_data.get_data_for_id(obj_id_1)
print(data)
|
It is worth noting that CombinedDataset
objects are aware of successive
join actions. This means that the following two examples are functionally
equivalent.
1 2 3 4 5 6 | # You can join multiple Id's at once ...
combined_data.join_ids(obj_id_1, obj_id_2, obj_id_3)
# Or join them successively
combined_data.join_ids(obj_id_1, obj_id_2)
combined_data.join_ids(obj_id_2, obj_id_3)
|