Module Pxp_ev_parser

module Pxp_ev_parser: sig .. end
Calling the parser in event mode


In event parsing, the parser generates a stream of events while reading XML text (one or several files). In "push mode" the user provides a callback function, and the parser invokes this function for every event. In "pull mode", the parser creates a function that fetches the next event of the stream, and that can be repeatedly be called by the user to get one event after the other.

For an introduction into this type of parsing, see Intro_events. In Pxp_event you find functions for composing, analyzing and transforming streams of events (for pull mode). For converting a stream of events into a tree, see Pxp_document.solidify. For converting a tree into a stream of events, see Pxp_document.liquefy.

val create_entity_manager : ?is_document:bool ->
Pxp_types.config -> Pxp_types.source -> Pxp_entity_manager.entity_manager
Creates an entity manager that is initialized with the toplevel entity referenced by the source argument. The entity manager can be used by process_entity below.

The following configuration options are interpreted:

is_document: true, the default, sets that the entity to read is a complete document, and false sets that it is only a fragment. The value true enforces several restrictions on document entities, e.g. that <![INCLUDE[..]]> and <![IGNORE[..]]> are not allowed and that additional nesting rules are respected by parameter entities.
val process_entity : Pxp_types.config ->
Pxp_types.entry ->
Pxp_entity_manager.entity_manager -> (Pxp_types.event -> unit) -> unit
Parses a document or a document fragment in push mode. At least the well-formedness of the document is checked, but the flags of the entry argument may specify more.

While parsing, events are generated and the passed function is called for every event. The parsed text is read from the current entity of the entity manager. It is allowed that the current entity is open or closed.

The entry point to the parsing rules can be specified as follows:

The entry points have options, see Pxp_types.entry for explanations.

It may happen that several adjacent E_char_data events are emitted for the same character data section.

There are filter functions that apply normalization routines to the events, see below.

Only the following config options have an effect:

If an error happens, the callback function is invoked exactly once with the E_error event. The error is additionally passed to the caller by letting the exception fall through to the caller. It is not possible to resume parsing after an error.

The idea behind this special error handling is that the callback function should always be notified when the parser stops, no matter whether it is successful or not. So the last event passed to the callback function is either E_end_of_stream or E_error. You can imagine that process_entity follows this scheme:

 try
   "parse";
   eh E_end_of_stream           (* eh is the callback function *)
 with
   error ->
     "cleanup";
     let pos = ... in
     let e = At(pos, error) in
     eh (E_error e); 
     raise e
 

Note that there is always an At(_,_) exception that wraps the exception that originally occurred. - This style of exception handling applies to exceptions generated by the parser as well as to exceptions raised by the callback function.

val process_expr : ?first_token:Pxp_lexer_types.token ->
?following_token:Pxp_lexer_types.token Pervasives.ref ->
Pxp_types.config ->
Pxp_entity_manager.entity_manager -> (Pxp_types.event -> unit) -> unit
This is a special parsing function that corresponds to the entry `Entry_expr, i.e. it parses a single element, processing instruction, or comment. In contrast to process_entity, the current entity is not opened, but it is expected that the entity is already open. Of course, the entity is not closed after parsing (except an error happens).


val close_entities : Pxp_entity_manager.entity_manager -> unit
Closes all entities managed by this entity manager, and frees operating system resources like open files.
val create_pull_parser : Pxp_types.config ->
Pxp_types.entry ->
Pxp_entity_manager.entity_manager -> unit -> Pxp_types.event option
Invoke the event parser using the pull model. It is used as:  let next_event = create_pull_parser cfg entry mng in
 let ev = next_event()
 

Now next_event should be invoked repeatedly until it returns None, indicating the end of the document. The events are encoded as Some ev.

The function returns exactly the same events as process_entity.

In contrast to process_entity, no exception is raised when an error happens. Only the E_error event is generated (as last event before None).

To create a stream of events, just do:  let next = create_pull_parser cfg entry mng in
 let stream = Stream.from(fun _ -> next())