Overview

Programmers can prepare a program for deep profiling by compiling it in a deep profiling grade such as asm_fast.gc.profdeep. When a program compiled in a deep profiling grade is executed, it builds a data structure containing profiling information, and writes this out to a file called Deep.data at the end of execution. Programmers can then browse the contents of these profiling data files using the Mercury deep profiling tool, mdprof.


The structure of the profiling data file

The data structure written out to the Deep.data file, the profiling tree, resembles a call graph. It has four kinds of nodes: CallSiteStatic, ProcStatic, CallSiteDynamic and ProcDynamic structures. These four node types are consistently abbreviated as css, ps, csd and pd throughout the deep profiling system, including the names of structure fields.

The Deep.data file consists of a header and a sequence of nodes. The header contains

The following sequence contains nodes of all four types. In the running program, these nodes refer to each other by pointers, but in the process of writing them out we convert these pointers to node ids, which are dense small integers starting at one.
CallSiteStatic
CallSiteStatic structures are created by the compiler. There is one CallSiteStatic structure for each call site in the source code. CallSiteStatic structures contain the following fields:
ProcStatic
ProcStatic structures are created by the compiler. There is one ProcStatic structure for each procedure in the source code. ProcStatic structures contain the following fields:
CallSiteDynamic
CallSiteDynamic structures are created by the instrumented program during a profiling run. There will be one or more CallSiteDynamic structures for each call site through which the program actually performs a call during the profiling run. For a given call site, there will be distinct CallSiteDynamic structures for each distinct context in which those invocations take place.
ProcDynamic
ProcDynamic structures are created by the instrumented program during a profiling run. There will be one or more ProcDynamic structures for each procedure which is called during the profiling run. For a given procedure, there will be distinct ProcDynamic structures for each distinct context in which those calls take place.

The Mercury deep profiling tool mdprof

The Mercury deep profiler consists of three programs. One is the web browser of the user's choice: this implements the user interface. The other two are mdprof and mdprof_cgi.
mdprof
This a simple shell script. It is invoked by the web server in response to queries of the right form. It does nothing more than set up the PATH environment variable to contain the directory in which mdprof_cgi was installed, and then invoke mdprof_cgi.
mdprof_cgi
This is a Mercury program. It is invoked once for every page displayed by the deep profiling system. It is passed, in the environment variable QUERY_STRING which is set by the web server, an URL component containing the name of a profiling data file and a query specifying which part of that data file is to be displayed. mdprof_cgi checks whether a server process already exists for the given profiling data file. If the answer is yes, it passes the query to the server, gets back the reply, gives it to the web server, and exits. If the answer is no, it reads in the named profiling data file, processes it to materialize information that is required by queries but is stored in the profiling data file only implicitly, and answers the query directly. It then forks itself. The parent exits to let the web server finish rendering the generated page. The child process becomes a server process, which goes into a loop awaiting queries. When it gets a query from mdprof_cgi, it answers the query and goes back to sleep. It exits when it has not received a query for a set timeout period, which by default is thirty minutes, or when it receives a "query" telling it to shut down. (Due to the timeout mechanism, shutting down the server explicitly is not useful unless the profiling data file has changed, the server has been recompiled, or one wants to recover its space occupied by its virtual memory.)
The reason why we create the server process via a fork instead of simply making the initial mdprof_cgi process the server process is that the web server requires the program it invokes to exit before it displays the page the program generates. Doing without a server doesn't work because we don't want have to read and process the deep profiling data file for every page to be displayed, since that takes a significant fraction of a minute. The reason for the split between mdprof and mdprof_cgi is to make it possible to specify some parameters of the deep profiler without needing to recompile a Mercury program or even needing to know Mercury.

The elements of the interface between the client and server roles of mdprof_cgi are documented in interface.m. The client and server communicate via a pair of named pipes whose names include a mangled form of the data file name. (The mangling is required to replace any slashes in the name of the data file.) The existence of these files serves as an approximation of a lock; the idea is that they exist if and only if a server process for that data file is alive and serving queries via those pipes. mdprof_cgi creates a server process for the data file if and only if these named pipes do not exist. They are created only by mdprof_cgi transforms itself into a server, and destroyed only when this server exits. The two files are always created and destroyed together.

There are potential race conditions both when the pipes are created and when they are destroyed. It is possible for the web server to receive two requests for a given data file in quick succession, and it is possible that when the second invocation of mdprof_cgi checks whether the pipes exist, the first invocation of mdprof_cgi has not yet forked itself off as a server process. We avoid this by putting all code that creates, destroys, or test the existence of the named pipes inside a critical region protected by a lock on a mutex file. Whichever invocation of mdprof_cgi gets the lock first will become the server; any others will not be able to perform the test for the existence of a server until after there is a server.

The other race condition involves a client arriving between the time that the server gets the timeout signal and the time that the server actually deletes the named pipes and exits. To fix this, we make clients create a file indicating that they want a server before they get the lock on the mutex file. If the shutting-down server gets the mutex first, it will abort the shutdown if it finds any want files around. If it does not find any want files, it shuts down, but because it holds the mutex lock throughout the process of shutdown, no client can observe its decision process.

In the absence of the want files, a server that got a timeout signal would have no decision to make. It would therefore be possible for a client to arrive and find that the named pipes exist, without knowing that the server process is already committed to shutting down, which can leave the client sending its query to a now nonexistent server.


Pipeline processing of deep profiler queries

As described above when the deep profiler starts it reads the deep profiling data. Processing is performed to make it easy to retrieve information from this data, this results in a structure called 'deep'. When a query arrives further processing is performed in several steps before HTML is produced for the user.

First, create_report generates a report structure from the cmd and deep structures. This report reflects all the information that may be shown to the user. The report structure can also be used by other tools such as mdprof_feedback to gather information to drive compiler optimisations. The report structure can be used by the report_to_display predicate to produce a display structure based on the user's display preferences. The display structure is a format-neutral representation of the final output. Finally htmlize_display produces HTML output from the display structure.

To support a new report a developer should add that report to the report type and the command for launching it to the cmd type. They will need to add support to create_report, report_to_display, string_to_maybe_cmd and query_to_string. They may need to modify the display structure in order to support displaying information of a different type (for instance, a differently formatted number). They should modify report_to_display for some existing reports to create links or buttons that perform the new query.


The modules of the deep profiler

mdprof_cgi.m
This file contains the program that is executed by the web server to handle each web page request.
mdprof_dump.m
This is the main module of a program for dumping out the contents of Deep.data files, for use in debugging.
mdprof_test.m
This is the main module of a test program for checking that all the web pages can be created without runtime aborts. Its output can be gigabytes in size.
mdprof_procrep.m
This is the main module of a test program used for reading and displaying the byte-code representation of the procedures of a Mercury program.
array_util.m
This module contains utility predicates for handling arrays.
callgraph.m
This module constructs an explicit representation of the call graph, so we can find its cliques.
canonical.m
This module has code to canonicalize call graphs (i.e. ensure that no clique contains more than one ProcDynamic from a given procedure). It also has code that uses canonicalization to merge two call graphs. This module is not complete yet.
cliques.m
This module allows you build a description of a directed graph (represented as a set of arcs between nodes identified by dense small integers) and then find the strongly connected components of that graph.
conf.m
This module contains primitives whose parameters are decided by the configure script. This module picks them up from the #defines put into runtime/mercury_conf.h by the configure script.
coverage.m
This module contains code that produces the coverage profiling reports. It also infers coverage throughout a procedure based on partial information and execution rules for Mercury programs.
create_report.m
This module contains the create_report predicate which takes a command and preprocessed deep profiling data and creates a report data-structure.
dense_bitset.m
This module provides an ADT for storing dense sets of small integers. The sets are represented as bit vectors, which are implemented as arrays of integers. This is used by cliques.m.
display.m
This module defines the display structure. This structure represents information to be displayed to the user. The information in a display structure is format-neutral.
display_report.m
This module contains the report_to_display predicate. This predicate takes a report structure and produces a display structure.
dump.m
This module provides a mechanism for dumping out some of the deep profiler's data structures for debugging.
exclude.m
This module implements contour exclusion, which is a mechanism for propagating measurements from regions in the call graph below a given line (the contour) to that line.
html_format.m
This module contains code that creates HTML output from a display structure for use by mdprof_cgi.
interface.m
This module defines the type of the commands sent from clients to servers, as well as utility predicates for manipulating commands and responses.
io_combinator.m
This module a set of I/O combinators for use by read_profile.m.
measurements.m
This module defines the data structures that store deep profiling measurements and the operations on them.
measurement_units.m
This module defines data types and predicates for various units of measurement. Including percentages and time.
profile.m
This file defines the main data structures of the server, and predicates for accessing them.
program_representation_utils.m
This module provides predicates that operate on program representation structures, including formatting such structures as text.
query.m
This module contains the top level predicates for servicing individual queries.
read_profile.m
This module contains code for reading in a deep profiling data file.
report.m
This module contains the report structure. A sub-structure is defined for each type of report that may be generated. The report structure represents the information contained in a report in a format that is easy to generate, and easy for a computer program to analyse. This module also contains common structures that multiple reports make use of.
startup.m
This module contains the code for turning the raw list of nodes read in by read_profile.m into the data structure that the server needs to service requests for web pages.
timeout.m
This module implements the timeouts that mdprof_sgi uses to shut down after it hasn't received any queries for a while.
top_procs.m
This module contains code to find the top procedures by several criteria.
util.m
This module defines utility predicates for both mdprof_cgi and mdprof_server.
var_use_analysis.m
This module contains predicates for analysing how soon or late a variable is used (produced or consumed) by a procedure.