******************************************************************************
Development of PXP
******************************************************************************


==============================================================================
PXP
==============================================================================

PXP is a validating parser for XML-1.0 which has been written entirely in 
Objective Caml. This page contains development information for PXP; if you are 
looking for the stable distribution, please go here [1]. 

==============================================================================
Download
==============================================================================

-  Current stable release: 1.1 [2]
   
-  Current development version: There is currently no development version! 
   
==============================================================================
Version History
==============================================================================

-  1.1: This is the new stable release!
   
-  1.0.99:
   Again a big change. First of all, the directory hierarchy has been modified. 
   You find now all installed modules in "src", and all tools in "tools", and 
   so on. The new hierarchy makes it simpler to add optional modules.
   For similar reasons, the package structure has changed, too. Instead of one 
   package "pxp", there are now up to five packages: "pxp-engine", 
   "pxp-lex-iso88591", "pxp-lex-utf8", "pxp-wlex", "pxp". Which packages are 
   selected and compiled depends on the result of the "configure" run (which is 
   new, too).
   The namespace support has been completed. The Pxp_marshal and Pxp_codewriter 
   modules can encode/decode namespace information. Namespace syntax is now 
   fully checked. The namespace_manager class has been moved from Pxp_document 
   to Pxp_dtd, and it is now considered as a new property of the DTD. There is 
   a new processing instruction: <?pxp:dtd namespace ...?> (see EXTENSIONS).
   More important, some design flaws in Pxp_document have been fixed. In 
   previous versions of PXP, it was a bit unclear which methods of the "node" 
   classes are actually involved in validation and which not. Pxp_document.mli 
   contains a discussion of this issue (and the changes in detail).
   The "node" classes have some new methods, allowing simpler in-place 
   modification of node trees.
   There are now functions in Pxp_document stripping whitespace and normalizing 
   trees.
   The version number of PXP is now 1.0.99, as this version is a release 
   candidate. The manual still needs updates, and some of the regression tests 
   needs to be fixed. The release of PXP 1.1 is expected to happen at the end 
   of June.
   
-  1.0.98.6:
   This is a bigger change, including an initial implementation of namespaces 
   and some cleanup in Pxp_document. In particular, the following modifications 
   have been done:
   In Pxp_document, there is now a separate class for every node type. You must 
   now instantiate comment_impl in order to get a comment node (same applies to 
   super_root_impl, pinstr_impl). It is no longer possible to instantiate 
   element_impl in these cases (leads to error message that the method 
   internal_init_other is not available).
   Namespaces: PXP implements namespaces by a technique called "prefix 
   normalization". This technique simplifies namespaces a lot and makes them as 
   compatible as possible with non-namespace processing. As defined by W3C, 
   namespaces are declared by a namespace URI (a unique identifier) but are 
   accessed using a shorter namespace prefix. The problem is that the prefixes 
   need not to be unique, even within a single document. To address this 
   problem and to avoid complications, PXP _rewrites_ the prefixes while the 
   document is being parsed such that the application using PXP only sees 
   unique prefixes. This means that every prefix corresponds to exactly one 
   namespace URI once the document has been parsed by PXP. The mapping between 
   the rewritten prefixes (called normprefixes) and the namespace URI is 
   managed by a namespace_manager (defined in Pxp_document). In order to 
   control the names of the normprefixes it is possible to fill the 
   namespace_manager with (normprefix, uri) pairs before the parser is called. 
   This results in a programming style where it is still possible to identify 
   element types by a single string (and not by an expanded_name as suggested 
   in some W3C standards). For example, in order to find out whether node x is 
   a HTML anchor, it is sufficient to check whether x # node_type = T_element 
   "html:a", and not necessary to perform the much more complicated operation x 
   # localname = "a" && x # namespace_uri = "http://www.w3c.org/TR/xhtml".
   Namespace normalization has the advantage that DTDs can declare the XML 
   objects using normalized prefixes.
   In order to activate namespace processing, the following modifications to 
   existing code are sufficient: (1) Create a namespace_manager (2) Set the 
   Pxp_yacc.config label enable_namespace_processing to the namespace manager 
   object (3) Use namespace_element_impl instead of element_impl. After these 
   steps have been carried out, the application sees normalized element and 
   attribute names (instead of unprocessed ones), and the additional namespace 
   methods of namespace_element_impl are available (e.g. method namespace_uri 
   to get the URI of the namespace).
   The namespace support is currently very experimental; your comments are 
   welcome. There are some known problems: (1) Pxp_marshal and Pxp_codewriter 
   have not yet been updated for namespaces, they may or may not work for your 
   application; (2) It is not checked whether element and attribute names 
   contain only one colon; (3) If you do not set the namespace_manager 
   manually, PXP simply chooses the first occurrence of a prefix as its 
   normalized prefix. If you do not work with explit prefixes but only with 
   default prefixes (using attribute xmlns="some uri"), PXP maps these to the 
   normprefix "default" - this might not be what you want.
   
-  1.0.98.5:
   Bugfix in Pxp_reader.combine.
   Some changes that could PXP make work under Cygwin.
   
-  1.0.98.4:
   New support for PUBLIC identifiers in Pxp_reader: The functions 
   lookup_public_id_as_file and lookup_public_id_as_string lookup PUBLIC 
   identifiers in a catalog (implemented as associative list). However, there 
   is still no way to load a catalog from a file.
   There are also catalogs for SYSTEM identifiers.
   The behaviour of Pxp_reader.combine can be better controlled by a mode 
   argument.
   Removed the -p switch from ocamlopt invocations.
   
-  1.0.98.3:
   A single fix (again line numbering) that only applies to the ocamlopt 
   version. The symptom was an "array-out-of-bounds" runtime error. 
   
-  1.0.98.2:
   Corrects a bug with incorrect line numbering that was in 1.0.98.1.
   This version contains numerous optimizations, making the parser clearly 
   faster. I had some gprof sessions, and it was possible to reduce the amount 
   of temporarily allocated memory. One important result is that the option 
   errors_with_line_numbers could be removed; line counting is now very cheap. 
   
-  1.0.98.1:
   The memory consumption of the node objects has been reduced.
   There is now a string pool option in type Pxp_yacc.config which makes it 
   likely that equal strings share the same memory block. This option is 
   experimental.
   There is now support for Alains lexer generator wlex. This is currently very 
   experimental. To enable wlex you must change the variable LEX_IMPL in 
   Makefile.conf and recompile everything. wlex is available here: 
   http://www.eleves.ens.fr:8080/home/frisch/soft
   Pxp_marshal contains functions to (un)serialize node trees and documents. 
   Loading the binary format is faster than parsing the XML source; 
   applications are inter-process communication, and loading constant XML texts 
   very quickly.
   Several bug fixes (but no serious bugs have been found yet).
   

--------------------------

[1]   see http://www.ocaml-programming.de/packages/documentation/pxp

[2]   see http://www.ocaml-programming.de/packages/pxp-1.1.tar.gz



