Programme Committee, Organization
1 March 2007: |
Deadline for electronic submissions of title and abstract |
4 March 2007: |
Deadline for electronic submissions of full papers |
2 April 2007: |
Notification of acceptance/rejection |
13 April 2007: |
Camera ready copies due |
13 May 2007: |
Early Registration |
27--30 June 2007: |
Conference at RISC, Hagenberg, Austria |
|
![]() |
The development of e-Science (cyberScience, Grid, etc.) is starting to become a reality with formalised data resources, services on demand, domain-specific search engines, digital repositories, etc. Increasingly STM (*) information will be contained in compound XML documents, representing scientific communication (articles, theses, repository entries, etc.). In physical sciences such as chemistry, materials science, engineering, physics, earth sciences, these "datuments" normally contain hypertext, graphics, tables, graphs and numerical data, mathematical objects and relationships. In addition they may also contain domain-specific content such as chemical formula and reactions, thermodynamic and mechanical properties, electric, magnetic and optical properties.
Among the domain-specific languages, CML (Chemical Markup Language) is the oldest and broadest, and is now being actively used for publishing by the Royal Society of Chemistry (Project Prospect) which gives an idea of what chemistry in datuments can look like. CML has had to develop the domain-specific objects (molecules, atoms, bonds, spectra, crystallography, etc.) and the relationships between them. However, due to the text-based nature of early XML, it has also had to design an implement domain-independent infrastructure which can support much of physical science. Originally called STMML it supports data types (float, integer, complex, etc.), data structures (arrays, lists, matrices, etc.), geometrical concepts (points, planes, lines, etc.) and scientific units of measurement. In addition CML bases much of its flexibility one user-created dictionaries (ontologies) which are hyperlinked from objects in the datuments.
It is now clear that the domain-independent parts of CML (and by extension some other markup languages in physical science) are loosely isomorphic with approaches in MathML and OMDOC. If a synthesis can be found, then CML can happily forget about the "non-chemistry" knowing that the mathematical and physical science community has a general way forward. In easiest-first order, the following are suggested :
A major challenge for distributed mathematics and science is discovery through search engines. These currently work on "free text" and are optimised to recognise strings. In a few cases domain-specific canonicalisations can be used (e.g. our Google Inchi transforms a molecular graph into a string which is recognised by search engines). However most cases require mathematical operations (arithmetic, transformations, subgraph-matching, etc.). How - and where - can these be performed? A new generation of domain-independent and domain-specific indexing and searching tools needs to be developed.
Recently CML has had to evolve a grammar to support fuzzy concepts representing sets of molecules. These have a distinguished mathematical history (see, e.g. enumeration of alkanes and references therein). Polymers and chemicals in patents ("Markush") are often expressed in text when a grammar would be more precise. Chemical searches are also often expressed in a grammar and evaluation or comparison of representations is a common activity.
The presentation will give a number of interactive demonstrations. No chemical knowledge is required!
(*) STM: Scientific, Technical (Engineering) Medical