This document describes the proposed policies for groups that are independent from the LSST Project and Operations (i.e. LSST Data Facility) and would like to stand up an independent Data Access Center (iDAC; existing data centers that could serve LSST data products are considered iDACs for purposes of this document). Some iDACs may want to serve only a subset of the LSST data products: this document proposes three portion sizes, from full releases to a “light” catalog without posteriors. Guidelines and requirements for iDACs in terms of data storage, computational resources, dedicated personnel, and user authentication are described, as well as a preliminary assessment of the cost impacts. Some institutions, even those inside the US and Chile, may serve LSST data products locally to their research community. Requirements and responsibilities for such institutional bulk data transfers are also described here. THE PURPOSE OF THIS DRAFT DOCUMENT IS TO SERVE AS A PRELIMINARY RESOURCE FOR PARTNER INSTITUTIONS WISHING TO ASSESS THE FEASIBILITY OF HOSTING AN IDAC.
LSST must supply trusted petascale data products. The mechanisms by which the LSST project achieve this unprecedented level of data quality will have spinoff to data-enabled science generally. This document specifies high-level requirements for a LSST Data Quality Assessment Framework, and defines the four levels of quality assessment (QA) tools. Because this process involves system-wide hardware and software, data QA must be defined at the System level. The scope of this document is limited to the description of the overall framework and the general requirements. It derives from the LSST Science Requirements Document [LPM-17]. A flow-down document will describe detailed implementation of the QA, including the algorithms. In most cases the monitoring strategy, the development path for these tools or the algorithms are known. Related documents are: LSST System Requirements [LSE-29], Optimal Deployment Parameters Document-11624, Observatory System Specifications [LSE-30], Configuration Management Plan [LPM-19], Project Quality Assurance Plan [LPM-55], Software Development Plan [LSE-16], Camera Quality implementation Plan [LCA-227], System Engineering Management Plan [LSE-17], and the Operations Plan [LPM-73].
There are various reasons for wanting to produce a representation of the camera focal plane in a physical coordinate system with the sensors oriented as they reside in the assembled instrument. The first reason is in engineering diagrams used for fabrication and assembly of the camera. The second is in visualizations for quality assesment, data analysis, or science verification purposes. For reasons motivated herein, these two corrdinates systems cannot be the same for LSST. This document presents the coordinate transform to be applied in translating between the two coordinate systems: 1) the camera engineering diagram coordinate system and 2) the camera data visualization coordinate system.
This document defines and describes the “LSST Science Platform,” a set of integrated web applications and services deployed at the LSST Data Access Centers (DACs) through which the scientific community will access, visualize, subset, and perform next-to-the-data analysis of the data collected by the Large Synoptic Survey Telescope (LSST).
These services can be broken down to three different “Aspects”: a web PORTAL, designed to provide essential data access and visualization services through a simple-to-use website, a NOTEBOOK environment, that will provide a Jupyter Notebook-like interface, based on JupyterLab, enabling next-to-the-data analysis, and an extensive set of WEB APIS that the users will be able to use to remotely examine the LSST data set using tools they’re already familiar with.
This document lays out the high-level vision for the aforementioned Aspects and some associated backend services. It is intentionally brief, and meant to generally guide the flow-down of requirements and development of product specifications, prioritization, and plans for the Agile development of the relevant elements of the DM system.
This document describes the data products and processing services to be delivered by the Large Synoptic Survey Telescope (LSST).
LSST will deliver three levels of data products and services. PROMPT (Level 1) data products will include images, difference images, catalogs of sources and objects detected in difference images, and catalogs of Solar System objects. Their primary purpose is to enable rapid follow-up of time-domain events. DATA RELEASE (Level 2) data products will include well calibrated single-epoch images, deep coadds, and catalogs of objects, sources, and forced sources, enabling static sky and precision time-domain science. The SCIENCE PLATFORM will allow for the creation of USER GENERATED (Level 3) data products and will enable science cases that greatly benefit from co-location of user processing and/or data within the LSST Archive Center. LSST will also devote 10% of observing time to programs with special cadence. Their data products will be created using the same software and hardware as Prompt (Level 1) and Data Release (Level 2) products. All data products will be made available using user-friendly databases and web services.
Note, prior to 2018 Data products were referred to as Level 1, Level 2, and Level 3; this nomenclature was updated in 2018 to Prompt Products, Data Release Products and User Generated Products respectively [LPM-231]. In the abstract of this document both nomenclatures are used but throughout the remainder of this document only the new terminology is used. In other project and requirements documentation, the old terminology will likely persist.
AMCL asked about distribution of various catalog products. They also suggested potential usage levels for these products. This brings questions such as: Qserv is built to allow SQL queries on sources, but will most queries be on object? This note discuses data access, interaction, and distribution.
A linear measure of flux is preferred for LSST catalogs. This document provides technical details about this preference in support of the LSST Project Science Team’s decision to adopt nano-jansky (1 nJy =10−35 Wm−2Hz−1) as the standard LSST flux unit. Difficulties associated with homogenizing broad-band flux measurements to a uniform system are also discussed in some detail.
A major product of the nightly processing of LSST images is a world-public stream of alerts from transient, variable, and moving sources. Science users may access these alerts through third-party community brokers, which will receive the LSST alerts, add scientific value, and redistribute them to the scientific community.
This document is a call for Letters of Intent (LOIs) to propose a community broker, as described in “Plans and Policies for LSST Alert Distribution” [LDM-612].
This document provides the charge to the review committee for the LSST Science Platform (LSP) Final Design Review (FDR). The review will be a formal and internal review of the planned LSP capabilities in the LSST operations era as per the guidelines outlined in LDM-294 and LSE-159.
A major product of the nightly processing of LSST images is a world-public stream of alerts from transient, variable, and moving sources. Science users will access these alerts through community brokers or through a simple filtering service provided by LSST. This document provides a guide to the plans and policies for the alert distribution system to aid science users, broker developers, funding agencies, and LSST Project personnel. It describes the components of the alert distribution system and the data rights required to access specific scientific products. It provides guidelines for organizations developing community brokers and describes the planned capabilities of the LSST simple alert filtering service and Science Platform to aid science users in planning for LSST science.
This document describes the detailed test specification for the LSST Science Platform. It is a work in progress; the current version provides Test Cases covering all requirements on the LSST Science Platform, however only ≈ 10% are currently fully specified. This document will be updated as work continues on completing Test Cases.
This document describes the detailed test specification for the LSST DM Raw Image Archiving Service. This is a specific DM test, and will grow as more tests are needed for the entire environment. This includes two individual tests for the overall raw image creation and ingest into the permanent record of the survey.
This management plan covers the organization and management of the Data Management (DM) subsystem during the development, construction, and commissioning of LSST. It sets out DM goals and lays out the management organization roles and responsibilities to achieve them. It provides a high level overview of DM architecture, products and processes. It provides a structured starting point for understanding DM and pointers to further documentation.
This document describes the operational concepts for the emerging LSST Data Facility, which will operate the system that will be delivered by the LSST construction project. The services will be incrementally deployed and operated by the construction project as part of verification and validation activities within the construction project.
The LSST middleware is designed to isolate scientific application pipelines and payloads, including the Alert Production, Data Release Production, Calibration Products Productions, and science user pipelines executed within the LSST Science Platform, from details of the underlying hardware and system software. It enables flexible reuse of the same code in multiple environments ranging from offline laptops to shared-memory multiprocessors to grid-accessed clusters, with a common I/O and logging model. It ensures that key scientific and deployment parameters controlling execution can be easily modified without changing code but also with full provenance to understand what environment and parameters were used to produce any dataset. It provides flexible, high-performance, low-overhead persistence and retrieval of datasets with data repositories and formats selected by external parameters rather than hard-coding.
The LSST Science Requirements Document (the LSST SRD) specifies a set of data product guidelines, designed to support science goals envisioned to be enabled by the LSST observing program. Following these guidlines, the details of these data products have been described in the LSST Data Products Definition Document (DPDD), and captured in a formal flow-down from the SRDvia the LSST System Requirements (LSR), Observatory System Specifications (OSS), to the Data Management System Requirements (DMSR). The LSST Data Management subsystem’s responsibilities include the design, implementation, deployment and execution of software pipelines necessary to generate these data products. This document describes the design of the scientific aspects of those pipelines.
The LSST Data Management System (DMS) is a set of services employing a variety of software components running on computational and networking infrastructure that combine to deliver science data products to the observatory’s users and support observatory operations. This document describes the components, their service instances, and their deployment environments as well as the interfaces among them, the rest of the LSST system, and the outside world.
This is the test report for LDM-503-1 (WISE Data Loaded in PDAC), an LSST DM level 2 milestone pertaining to the LSST Science Platform, with tests performed according to LSP-00, Portal and API Aspect Deployment of a Wide-Area Dataset.
From September 2018 to March 2019, LSST Data Management (DM) and Google Cloud conducted a proof of concept engagement to investigate the utility, applicability, and performance of Google Cloud services with regard to various DM needs. This document describes what was accomplished in each area, including measurements obtained, and presents some conclusions about how the engagement went overall.
The LSST DM Batch Production Services are designed to allow large-scale workflows, including the Data Release and Calibration Product Productions, to execute in well-managed fashion, potentially in multiple environments. They ensure that provenance is captured and recorded to understand what environment and parameters were used to produce any dataset.
The Data Backbone (DBB) is a key component that provides for data storage, transport, and replication, allowing data products to move between enclaves. This service provides policy-based replication of files (including images and flat files to be loaded into databases as well as other raw and intermediate files) and databases (including metadata about files as well as other miscellaneous databases but not including the large Data Release catalogs) across multiple physical locations, including the Base, Commissioning Cluster, NCSA, and Data Access Centers. It manages caches of files at each endpoint as well as persistence to long-term archival storage (e.g. tape). It provides a registration mechanism for new datasets and database entries and a retrieval mechanism compatible with the Data Butler.
A summary of the tests performed in the Spring of 2019 to quantify the impact of variable seeing on the quality of the Differential Chromatic Refraction (DCR) correction algorithm. The current state of the DCR algorithm naively assumes that all of the input exposures used to construct the template have identical PSFs, and there has been some concern that the quality of the model will degrade when those input exposures have variable seeing. In this note I compare the performance of templates created with DcrAssembleCoaddTask with our current standard, CompareWarpAssembleCoaddTask, and estimate the range of observing conditions where the current algorithm is sufficient.
A design study for improving the persistence of Stack classes, particularly lsst.afw.image.Exposure and its components. This note describes the design decisions behind the lsst.afw.typehandling.GenericMap class template and explores options for an off-the-shelf persistence framework to replace lsst.afw.table.io.
The DPDD allocates space for pre-computed timeseries features, and a sample set is baselined in LDM-151. However, other features have been developed. This technote reviews the relevant literature, grouping related features where possible, and discusses potential concerns.
This document is a proposal for an engagement to verify that a cloud deployment of the LSST Data Management (DM) Data Release Production (DRP) is feasible, measure its performance, determine its final discounted cost, and investigate more-native cloud options for handling system components that may be costly to develop or maintain.
This document describes how DM services and software are used to support Observatory Operations, includingg critical Commandable SAL Components (CSCs) at the Summit and the Commissioning Team at the Summit and Base.
This proposal is an improved way to deal with Conda environments when building the Science Pipelines software. The benefits of implementing it can be seen in both the development and release processes. The effort required for its implementation is limited since it reuses all tooling already in place.
This document reviews five options for alert production in LSST Operations Year 1 (LOY1), taking into account any implications on LSST formal requirements including up-scopes, down-scopes or explicit violations. The Data Management System Science Team’s preferred option for maximizing LSST science is to generate template images from as much of the commissioning data as possible (∼2000 $\rm deg^2$) and use them to run Difference Image Analysis and alert production during LOY1. A proposal to increase the sky area covered by commissioning-data templates in at least a single filter to ≳10,000 $\rm deg^2$ via a “filler” scheduler program is also presented. As a potential moderate up-scope, this study presents an option to build interim templates on a ∼monthly basis during LOY1, which could increase the accessible sky area by ∼1000-2000 $\rm deg^2$ per month, which can be reconsidered closer to the start of Operations.
Narrative summary of the LSST Science Platform capabilities that will be available during the remainder of Construction (FY19+) and, specifically, for initial AuxTel commissioning, main telescope commissioning with ComCam and with the full camera, and for any early data releases prior to the start of formal operations.
A high-level presentation of proposed changes to Solar System processing and data products baseline, to bring LSST Solar System processing closer to common asteroid-survey workflows, and help the on-schedule delivery of a scientifically useful set of Solar System data products.
The EFD is a powerful tool for correlating pipelines behavior with the state of the observatory. Since it contains the logging information published to the service abstraction layer (SAL) for all sensors and devices on the summit, it is a one stop shop for looking up the state of the observatory at a given moment. The expectation is that the science pipelines validation, science verification, and commissioning teams will all need, at one time or another, to get information like this for measuring the sensitivity of various parts of the system on observatory state: e.g. temperature, dome orientation, wind speed and direction, gravity vector. This leads to the further expectation that the various DM and commissioning teams will want to work with a version of the EFD that is accessed like a traditional relational database, possibly with linkages between individual exposures and specific pieces of observatory state. This implies the need for another version of the EFD (called the DM-EFD here) that has had transforms applied to the raw EFD that make it more immediately applicable to questions the DM team will want to ask.
LSST images will be contaminated with transient artifacts, such as optical ghosts, satellite trails, and cosmic rays, and with transient astronomical sources, such as asteroid ephemerides. We developed and tested an algorithm to find and reject these artifacts during coaddition, in order to produce clean coadds to be used for deep detection and preliminary object characterization. This algorithm, CompareWarpAssembleCoadd, uses the time-series of PSF-matched warped images to identify transient artifacts. It detects artifact candidates on the image differences between each PSF-matched warp and a static sky model. These artifact candidates include both true transient artifacts and difference-image false positives such as difficult-subtract-sources and variable sources such as stars and quasars. We use the feature that true transients appear at a given position in the difference images in only a small fraction (configurable) of visits, whereas variable sources and difficult-to-subtract sources appear in most difference images. In this report, we present a description of the method and an evaluation using Hyper SuprimeCam PDR1 data.
We quantify the performance of the LSST pipeline for processing crowded fields, using images obtained from DECam and comparing to a specialized crowded field analysis performed as part of the DECAPS survey.
Considering single-visit LSST depth, in an example field of roughly the highest density seen in the LSST Wide-Fast-Deep area, DECAPS detects 200 000 sources per sq. deg. to a limiting depth of 23rd magnitude. At this source density the mean LSST-DECAPS completeness between 18th and 20th mag is 80%, and it drops to 50% at 21.5 mag.
For fields inside the Galactic plane cadence zone, source density rapidly increases. For instance, in a field in which DECAPS detects 500 000 sources per sq. deg. (5σ depth of 23.2), the mean completeness between 18th and 20th mag is 78%, and it drops to 50% at 20.2 mag.
In terms of photometric repeatability, above 19th mag LSST and DECAPS are in a systematics-dominated regime, and there is only a slow dependence on source density. At fainter magnitudes, the scatter between LSST and DECAPS is less than the uncertainty from photon noise for source densities up to 100 000 per sq. deg, but the scatter grows to twice the photon noise at densities of 300 000 per sq. deg. and above.
For repeat measurements of the same field with LSST, the astrometric scatter per source is at the level of 10-30 milliarcseconds for bright stars (g < 19), and is not strongly dependent on stellar crowdedness.
We report on the investigation into the use of lossy compression algorithms on LSST images that otherwise could not be stored for general retrieval and use by scientists. We find that modest quantization of images coupled with lossless compression algorithms can provide a factor of ∼6 savings in storage space while still providing images useful for followup scientific investigations. Given that this is only means that some products could be made quickly available to users and would free resources for community ues that would otherwise be necessary to re-compute these products, we recommend that LSST consider using a lossy compression to archive and serve image products where appropriate.
Pipeline driver tasks such as singleFrameDriver, coaddDriver, and multiBandDriver have been tested to see what their memory usage is. This technote will detail how these memory tests were ran and what results were found.
This document provides an in-depth description of the role of the LSST Project in preparing software and providing computational resources to process the data from Special Programs (deep drilling fields and/or mini-surveys). The plans and description in this document flow down from the requirements in LSE-61 regarding processing for Special Programs. The main target audience is the LSST Data Management (DM) team, but members of the community who are preparing white papers on science-driven observing strategies may also find this document useful. The potential diversity of data from Special Programs is summarized, including boundaries imposed by technical limitations of the LSST hardware. The capability of the planned Data Management system to processes this diversity of Special Programs data is the main focus of this document. Case studies are provided as examples of how the LSST software and/or user-generated pipelines may be combined to process the data from Special Programs.
We use version 13.0.9 of the LSST DM stack to reduce optical R band data taken with the KPNO 4-meter telescope for the Deep Lens Survey (DLS). Because this data set achieves an LSST like depth and has been studied and characterized exhaustively over the past decade, it provides an ideal setting to evaluate the performance of the LSST DM stack. In this report we examine registration, WCS fitting, and image co-addition of DLS data with the LSST DM stack. Aside from using a customized Instrument Signature Removal package, we are successful in using the DM stack to process imaging data of a 40 x 40 square arcminute subset of the DLS data, ultimately creating a coadd image. We find the astrometric solutions on individual chips have typical errors <15 miliarcseconds, demonstrating the effectiveness of the Jointcal package. Indeed, our findings in this regard on the DLS data are consistent with similar investigations on HSC and CFHT data.
A closer look at the astrometry data set shows it contains larger errors in Right Ascension than Declination. Further examination indicates these errors are likely due to a guider problem with the telescope, and not the result of proper motions of stars, or a problem with the DM stack itself.
Finally, we produce a coadd using the reduced data. Our coadd is approximately 40 square arcminutes-much larger than the coadds typically created with the stack. Creating a large image stretched our machines to their limits, and we believe a dearth of system resources lead to coadd creation Task not finishing. In spite of this, the coadd produced by the stack is of comparable quality to its counterpart produced by the DLS team in previous analysis in terms of depth, and ability to remove artifacts which do not correspond to true astrophysical objects. However issues were encountered with SafeClip.
In this note we present some aspects of the observed I/O behavior of the command line tasks ingestImages.py and processCcd.py when used for processing HSC data and the issues the current implementation may raise for processing data at the scale needed for LSST.
Report on the delivery, installation, and initial use of an LSST Camera data acquisition (DAQ) test stand at NCSA, July 18-20, 2017. Includes notes from a discussion of future plans for DAQ work that was held following the installation.
This is a descriptive and explanatory document, not a normative document. This document explains the proposed baseline as presented in the DM replan in July, 2017, referred to just “baseline” in the prose that follows.
The purpose of this document is to begin to assemble the diversity of motivations driving the inclusion of photometric redshifts in the LSST Data Release Object Catalog, and prepare to make a decision on what kind of photo-z products will be used. The roadmap for this process is described in Section [sec:intro]. We consider the photo-z use-cases in order to validate that the type of photo-z incorporated into the DRP catalog, and the format in which it is stored, meets the needs of both DM and the community. We also compile potential evaluation methods for photo-z algorithms, and demonstrate these options by applying them to the photo-z results of two off-the-shelf photo-z estimators. The long-term plan is for this document to develop over time and eventually describe the decision-making process and the details of the selected algorithm(s) and products. PRELIMINARY RECOMMENDATIONS CAN BE FOUND IN SECTION [SEC:INTRO].
This attempts to summarise the debate around, and suggest a path forward, for LSST software releases. Although some recommendations are made, they are intended to serve as the basis of discussion, rather than as a complete solution.
This material is based on discussions with several team members over a considerable period. Errors are to be expected; apologies are extended; corrections are welcome.
Most LSST objects will overlap one or more of its neighbors enough to affect naive measurements of their properties. One of the major challenges in the deep processing pipeline will be measuring these sources in a way that corrects for and/or characterizes the effect of these blends.
The current reference catalog matcher used by LSST for astrometry has be found to not be adequately robust and fails to find matches on serveral current datasets. This document describes a potential replacement algorithm, and compares its performance with the current implementation.
I use the StarFast simulator to generate many simulated observations of a field at a range of airmasses from 1.0 to 2.0, and at several LSST bands. After differencing each image from the observation in each band closest to zenith, I generate a metric to characterize the number and size of dipoles in the residual.
Writeup of work done in Summer 2015 Winter 2016 to test for shear bias in measurements done using CModel and ShapeletPsfApprox from the DM stack. Tests were done on galaxies of known shape in the style of great3sims using constant shear. Psfs applied were produced by PhoSim.
Recently, in DMTN-085, the QA Strategy Working Group (QAWG) made specific recommendations to improve the SQuaSH metrics dashboard. This technote presents a technical overview of the current implementation, and a plan to implement what is missing.
Examples and tutorials are important elements of successful documentation. They show the reader how something can be accomplished, which is often more powerful and effective than a mere description. Such examples are only useful if they are correct, though. Automated testing infrastructure is the best way to ensure that examples are correct when they are committed, and remain correct as a project evolves. Recently, in DMTN-085, the QA Strategy Working Group (QAWG) issued specific recommendations to improve how examples are managed and tested in LSST's documentation. This technote analyzes these recommendations and translates them into technical requirements. Subsequently, this technote also provides an overview of how example code management and testing has been implemented.
The notebook-based test report system provides a way for LSST to generate and publish data-driven reports with an automated system. This technote describes the technical design behind the notebook-based report system.
The JupyterLab environment is becoming a powerful tool for all sorts of tasks that LSST team members commonly undertake. Data exploration and analysis are obvious cases where the distributed notebook environment is useful. This notebook will show how to use the notebook environment in conjunction with other aspects of JupyterLab: shell access and git authentication, to produce a meaningful development workflow.
This technote explores how JSON-LD (Linked Data) can be used to describe a variety of LSST project artifacts, including source code and documents. We provide specific examples using standard vocabularies (http://schema.org and CodeMeta) and explore whether custom terms are needed to support LSST use cases.
This technote describes, in a tutorial style, the lsst.verify API. This Verification Framework enables the LSST organization to define performance metrics, measure those metrics in Pipeline code, export metrics to a monitoring dashboard, and test performance against specifications.
LSST the Docs is a platform for publishing documentation websites. Through a microservice architecture, LSST the Docs is capable of building even complex multi-repository documentation projects and publishing many versions concurrently on the web.