Course Descriptions

This page gives a brief description of the classes, discussions and exercises that will be part of the NVO Summer School. The summer school comprises a 2 day preschool, which discusses many of the underlying software technologies that are used within the virtual observatory, and 5 1/2 days devoted towards how we can use the VO in doing science. The descriptions below are divided into a number of categories:

Preschool

SQL (Nieto-Santisteban, 2.5 hours, 9/6)

In the era of Tera and Peta byte astronomical datasets, Database Management Systems (DBMS) are becoming essential tools to access and mine the data. This class will cover basic concepts related to database systems and the Structured Query Language (SQL). The student will learn how to formulate a diversity of queries varying in complexity and will experiment with current on-line astronomical databases. Performance and query optimization will be discussed as well.

Java (Plante, 1.5 hours, 9/7)

This session will be oriented to programmers that don't happen to program in Java. The aim of this session is provide you with just enough familiarity with Java to allow you to hack and use Java Summer School tools. We'll walk through the anatomy of a Java program and the building block of object-oriented programming, the Java class. We'll highlight some of the important standard Java libraries and commonly-used utility classes. We will show you how tweak, compile, package, and run Java programs.

XML Technologies: Schema, XPath, XQuery, and XSL (Plante, 45 minutes, 9/6)

In this session, we will give an overview of four important XML technologies used in the VO: XML Schema, XPath, XQuery, and the XML Stylesheet Language. We will show examples of each technology in action, describe how it is used in the VO, highlight the major components, provide you with tips on how interpret samples using these techonologies. The aim is not to make you proficient enough to compose your own uses from scratch, but rather to recognize what existing samples are doing. In some cases, you may be able to tweak an existing use to your own purposes; in others, it may help you debug an application that uses the XML technology.

Using the VO Libraries (Graham and Green, 1 hour, 9/7)

This session is an introduction to the NVO Summer school (NVOSS) software provided for the course. Basic instructions for software component installation on Mac, Windows, or Linux OS will be presented and simple tests will be performed. The NVOSS software examples were developed in JAVA for platform independence. We will review of the general organization of the NVOSS software libraries, data files, applications and tools in preparation for the remainder of the course.

Grid Computing (Williams, 45 minutes, 9/7)

The Grid is not a computer, nor is it a set of computers, but rather a conceptual model of interconnection, together with the software to realize that concept. The variety of high-performance computers, storage, and networks can be made available to the variety of scientists and other users. The NSF TeraGrid ? consortium makes vast resources available in the form of space-shared processor clusters with both fast and archival storage. Authentication is uniform across the clusters and graduated for different users, there is a uniform software stack, and emerging application-specific portals.

SOAP and web services (Graham, 30 minutes, 9/6)

This session introduces the concept of web services and how these can be implemented via SOAP. The advantages of using SOAP over other implementations is discussed as well as when it is not appropriate to use SOAP. The anatomy of a SOAP message is described and how this can be defined through the use of WSDL is explained. The different flavours of WSDL are covered and how to ensure basic interoperability between different client and server implementations is described.

Security and certificates (Graham, 30 minutes, 9/7)

This session presents an overview of web security. The concepts of authentication, authorization and encryption are discussed. The different levels of security from unfettered access to SAML assertion tokens are described. Particular attention is paid to the most prevalent form of security: digital certificates and digital signatures. The process of obtaining a certificate and using it is covered. Finally the rudiments of secure web services via the WS-Security specification are presented.

Advanced protocols (Graham, 30 minutes, 9/7)

This session presents an introduction to emerging standards in the web services arena. Topics covered include: SOAP attachments (DIME and MTOM), security (WS-Security), stateful web services (WSRF) and asynchronous web services (WSRF). The current state of different implementations of these standards is also covered.

Preschool mini-language tutorials (9/7)

PHP (Kwok, 30 minutes)

PHP is scripting language mostly used together with a web-server to implement dynamic HTML pages. This mini tutorial gives a general overview of the applications of PHP, and when is most advantageous to use PHP. In a few simple examples, students will learn about the basic syntax of the language, its built-in data types, and some object-oriented aspects. The basic concept of a web service will be briefly presented and implemented in a more advanced exercise. As a mature scripting language PHP has a large collection of extension modules. In particular, mysql is a very popular module that allows PHP scripts to interface with MySQL servers. Modules also exist to interface with other commercial and open source database servers. If time permits, examples of web services with database access can be presented.

Python (Kwok, 30 minutes)

Python is very high level scripting language that is becoming increasingly popular in the astronomical community. Similarities and differences to other languages like C or PHP, as well as complex built-in data types will be presented. Simple examples will show students various elements of Python programs and the ease of use of Python for building graphical user interfaces. If time permits, more advanced examples will show students Python libraries in different application areas, such as graphics, networking, and HTTP access.

Note: XML and SOAP libraries will not be covered in these mini tutorials, as they will be introduced later in the summer school.

IRAF (Fitzpatrick, 30 minutes)

IRAF is the "Image Reduction and Analysis Facility", a general-purpose software system for the reduction and analysis of O/IR astronomical data. This discussion will acquaint new users to the many tools available in the system, basic concepts such as task invocation, the parameter mechanism, graphics/image display, interactive and batch execution, and basic CL (Command Language) scripting. This will be a very high level overview of the system with the goal of introducing the capabilities of IRAF that may later be used in the student projects.

This session is aimed at users not already familiar with IRAF, however depending on the level of interest and student's background, more advanced topics can be covered instead.

C# (Graham, 30 minutes)

This session presents the C# programming language. It describes the two main implementations of C#: Microsoft .Net and the open source Mono version. Similarities to other object-oriented languages, e.g. Java, are illustrated and examples given showing how to handle SQL queries, XML documents and simple web service calls.

IDL (Miller, 30 minutes)

The Interactive Data Language (IDL), a pre-compiled language published by RSI, provides a seamless environment for astronomers to reduce, analyze and/or visualize their astronomical data. In addition, individual astronomers have created dozens of publicly available software libraries in common use amongst the research community. IDL also provides external language bindings as well as XML support, which are immediately applicable to VO webservice client-side applications.

In this class, we will provide a brief overview of IDL and its most commonly used astronomical libraries. We will present the different structures in which IDL utilizes the standards adopted for the VO. We will show how new VO functionality can be achieved from within IDL. Finally, we will work through an end-to-end science project, from finding the VO data to creating publication quality results.

VO Protocols

Overview of VO Protocols (McGlynn, 45 minutes, 9/8)

This discussion sets the context for later discussions of specific VO protocols. The roles and interactions of each of the major VO protocols will be described. Topics include VO data access, the simple image (SIA) and spectral access (SSA) protocols; catalog access using cone search, SkyNodes and registries; metadata standards including the resource metadata, uniform column descriptors (UCDs), and space-time coordinate (STC) descriptors; and data encodings including the Astronomy Data Query Language and VOTable.

None of these protocols will be explored in detail, but where each is used will be illustrated in a set of typical use-case scenarios.

Skynodes (Greene, 1.5 hours, 9/9)

OpenSkyQuery (OSQ) portal allows cross matching between major catalogs in small areas of sky. It provides selection of parameters from catalogs using criteria based on comparisons between values contained in one or more catalogs. Catalogs are registered in the VO Registry as Skynodes. The OSQ portal queries the registry to find which Skynodes are available. The OSQ portal is built upon web services. This session will demonstrate construction of a simple web service. The requirements and specifications of the Skynodes will be looked at. The specification includes a WSDL description of the Skynode service from which the server side stubs may be generated. This has already been performed and the entire server has been filled out in JAVA. We will look at configuring this JAVA Skynode server package for a particular catalog.

ADQL (Plante, 45 minutes, 9/9)

In this session, we will focus on what ADQL is and how it is used in the VO. We will first look at ADQL from the user's perspective in the form of ADQL/s. We will compose a few new queries using the SkyPortal interface. We will also look at it from the developer's perspective in the form of ADQL/x. We will show examples of existing software that can be used to process ADQL, and we will highlight the various ways ADQL is used in the VO.

Unified Content Descriptors or UCDs (Williams, 30 minutes, 9/11)

The Unified Content Descriptor (UCD) is a formal vocabulary for astronomical data that is controlled by the International Virtual Observatory Alliance (IVOA). The vocabulary is restricted in order to avoid proliferation of terms and synonyms, and controlled in order to avoid ambiguities as far as possible. It is intended to be flexible, so that it is understandable to both humans and computers. UCD describe astronomical quantities, and they are built by combining words from the controlled vocabulary. A UCD does not define the units or name of a quantity, but rather ^Ówhat sort of quantity is this?^Ô; for example "phys.temperature" is the UCD for temperature, and "phot.flux;em.radio;arith.ratio" is a ratio of radio fluxes. This lecture will cover the syntax, meaning, uses, interpretation, and future plans for the UCD vocabulary.

VOTable (Fitzpatrick, 30 minutes, 9/8)

VOTable was formally adopted by the VO community in 2002 as an XML standard for the interchange of data represented as a set of tables. One primary goal for the standard was to encourage interoperability by providing a flexible storage and exchange format for tabular data, separation of metadata and data to facilitate big-data and Grid computing, and by utilizing XML standards to allow applications to validate and/or transform an input document.

This session will provide an overview of the VOTable document structure, the relationship between VOTable and other VO standards such as UCD and community standards such as FITS, use of VOTable in data access protocols (e.g. SIA, Cone, SSA), software libraries for reading/writing/transforming files and proposed extensions to the protocol. Students will learn how VOTables may be generated by data providers or consumed by client applications and services, and what tools currently exist to enable either.

Cone Search and SIA Protocols (Tody, 1 hour, 9/8)

This session will provide an overview of the VO Data Access Layer and introduce the concepts and overall approach to data access in the VO. The cone search and Simple Image Access (SIA) protocols will be reviewed. The use of the protocols will then be demonstrated using simple tools such as a Web browser and the Unix command line.

Simple Spectral Access Protocol (Tody, 45 minutes. 9/8)

The Simple Spectral Access protocol provides access to 1D spectra, time series, and Spectral Energy Distributions (SEDs). The concepts behind the Simple Spectral Access protocol will be explored, then the interface will be reviewed. Some simple examples of spectral services will be examined.

Using VO Protocols

Magical Mystery Tour: What can you do with the VO today? (Hanisch, 1 hour, 9/8)

Publication and Resource Discovery (Plante and Greene, 45 minutes, 9/11)

This session will be presented in two parts. In the first part, we will present an overview of the VO Publishing process, focusing especially on how to register new resources, particularly new services. In the second part, we'll look at how to use existing registry portals and services to discover data and services you can use in a VO application. In particular, we will look at registry web services that provide a programmatic capability for applications to search and retrieve astronomical resources which fit a specific search criterion. A simple registry web service client will be built and demonstrated. Using IDL, another registry client example will be presented as an analytical tool for exploring Resource Metadata.

Correlation Services in the VO (Krughoff, 9/9)

One of the main advantages afforded by publishing data in the VO framework is the power to cross match large datasets in real time. This session will examine correlation services available in the VO (specifically OpenSkyQuery). We will cover access to correlation services through web services as well as how to add catalogs on a temporary basis for matching with other larger datasets. This will require cursory knowledge of VOTable and SQL.

Data and Service Discovery (Kwok, 45 minutes, 9/11)

This session will bring all the basic VO concepts, such as VOTable, VO Registry, SkyNode , SIAP, SSAP, together and will discuss the process and data flow of retrieving information using VO technologies. The registry returns results in VOResource format. One important field in VOResource is the access URL, which is the entry point to the web-service that provides the information contained in the resource. Students will learn how to extract that URL and use it to build queries via Cone-Search, SIAP, SSAP or SkyNode . SIAP and SSAP return VOTables that contain pointers to the desired information, not the information, i.e. the image or spectrum itself. Different techniques to inspect and retrieve that information will be presented.

Dealing with invalid data (Kwok, 30 minutes, 9/11)

When dealing with VO services and web-services in general many exceptions can occur. Some examples situations that student should be aware of are: bad URL, empty result, HTML error message, SOAP fault exception, SOAP fault exception from a HTTP GET, VOTable containing an error message, deviating default response, etc.. Students will learn how to recognize these exceptions and to avoid them.

Integrating Existing Tools (Fitzpatrick and Miller, 1 hour, 9/9)

Incorporating VO capabilities into legacy software systems, and similarly making those systems available as VO services, is a high-priority science capability in the current stage of NVO development. This class will touch on both the client-side integration issues (e.g. data access, use of the Registry, and VOTable handling) as well as the server-side deployment of legacy software in the VO (e.g. as Web and Grid services). A brief overview of the many libraries and toolkits available for a variety of programming environments which can be used as part of an integration plan will also be presented.

Server-side integration will discuss various strategies and requirements for wrapping legacy data analysis code to produce web-services, what makes a good candidate for a service and what does not, and how to create new services using existing legacy systems and deploy them easily.

The client-side discussion will cover the inclusion of Registry queries for data discovery, data access and retrieval, and typical application changes needed to incorporate VO capabilities. This session will try to be language and environment neutral however specific examples will be used to demonstrate the concepts.

Lessons Learned In VO Integration (Fitzpatrick and Miller, 30 minutes, 9/9)

This class will review some of the problems and pitfalls encountered during the (ongoing) integration of VO capabilities in a large legacy system such as IRAF. Lessons learned during prototype development changed our approach to the problem several times and we feel are applicable to small application integration efforts as well. We hope students can apply these lessons to their own code, or will feel motivated to apply them to the rich supply of existing astronomical software with no active VO development, and create many interesting new VO services and applications.

Managing VO data and process flows (Graham, 45 minutes, 9/11)

This class describes how data can be stored and transferred within the VO (via VOStore and VOSpace) and how this fits into the wider context of process flows. The use of proxies, portals and workflow engines will be covered as well as VO-specific developments such as CEA.

What's on tap in the VO? (Hanisch, 45 minutes, 9/12)

Student Exercises

A VO Client (Tody, 1.25 hours, 9/8)

In this session we will walk through how to build some simple client applications in various languages to query and retrieve data using the simple cone service and the simple image access and simple spectral access services.

A VO Service (Greene 1.25 hours, 9/9)

In this session we will look at setting up the Tomcat webserver, writing java server pages (jsp) and building web application archives (WAR) using ANT. This technology will be used to build a simple cone search service and then a Simple Image Access shell service.

Combining VO Elements (Kwok 1.25 hours, 9/11)

In this session students will follow a prepared exercise step by step and learn how to query the registry and to parse the returning VOResource. Then information for a cone-search or another type of service will be extracted. A query with RA, DEC and search radius will be built and sent to selected service providers. After waiting for completion, the returning VOTables will be merged and displayed (or printed). This exercise will use Python. Questions regarding Java and PHP can be answered as well.

Summer School Projects (9/12-9/14)

A substantial block of time is reserved in the Summer School for students to design, build and demonstrate new VO tools and capabililites. Last year's projects included tools for the correction of spectra for interstellar absorption, Voogle -- simple but powerful VO interface, and the initiation of an ongoing effort to provide a standardized interface for users and sofware interested in alerts for time critical astronomical events. Faculty will assist student groups in achieving their science goals. Students are encouraged to form groups but may pursue individual projects.

Science Themes

Data Quality in the VO: Discussion (De Young, 1.5 hours, 9/8)

Using the VO for Cross-correlations (Krughoff, 1.5 hours, 9/9)

Using the web service WESIX as a model, we will discuss integration of the SkyPortal web service into a working data pipeline. The example science case will involve using WESIX to extract source catalogs from stacked SDSS images and match them with the SDSSDR2 SkyNode in order to obtain calibrated extended object catalogs.

Object classification with the Virtual Observatory (McGlynn, 1.5 hours, 9/11)

This class describes approaches for sorting objects into sets of physically distinct classes using VO datasets and protocols. Students will learn how to use the VO to assist in object classification and the scientific concerns that they must address for a robust result.

The student is introduced to classifiers in general, including the distinctions between classification, cross-identification, and looking for outliers. There will be an overview of the differences between supervised and unsupervised classification and some of the commonly used classification algorithms.

The use of correlative information in classification is a natural use of the VO and this class will discuss how a researcher can find useful VO resources using VO registries, and query them using VO query protocols. The practical limitations on classification set by current network bandwidth will be discussed.

How the VO can be used to set up a classification pipeline, and lessons learned in building operational pipelines are the core of the class. This includes how to find the best set of classification criteria, how to evaluate the accuracy of the classifications, and how atomic the output classification may be made.

The Grid for Analysis of large VO datasets (Williams, 1.5 hours, 9/12)

This lecture covers some pragmatic techniques, scripts, and experiences that have been useful in integrating VO services with high-performance computing. Topic include script-based calls to VO services, virtual data for fault-tolerance, achieving parallelism in computing and networking, putting data in containers, and how to select storage resources for different purposes.

Looking for Outliers using the VO (Miller, 1.5 hours, 9/13)

Outliers are named as such due to their low probability of coming from some known statistical distribution (often called a "null" hypothesis or "prior"). Outliers can be classified in Frequentist terms (based on standard probabilities) or in Bayesian terms (based on degrees- of-belief and prior knowledge). Both methods have their advantages and disadvantages and both should agree in the limit of large N.

With the large amount of data, along with multiple wavelength data and high dimensionality (i.e. lots of columns in catalog data), the Virtual Observatory provides a nearly ideal data resource for discovering statistical outliers. But in this new era of VO astronomy, there are many statistical pitfalls created by such heterogeneous data (in the forms of censoring, correlations, false discoveries, etc).

In this course, we will present an overview of Frequentist and Bayesian statistical methods. We will show examples of each method with respect to classifying outliers. We will discuss the advantages and disadvantages of each method as well as pitfalls common to both. In all cases we will utilize VO resources for discovery, retrieval and analysis.