Motivation, Main Ideas, History, and Current State

About the SymbolicData Project



Motivation for the SymbolicData Project

For different purposes algorithms and implementations are usually tested on certain benchmarks. Although being controversially discussed such benchmarks set (at least) well defined environments to compare otherwise incomparable approaches, algorithms, and implementations.

Benchmark suites for symbolic computations are not as well established as in other areas of computer science. This is probably due to the fact that there are not yet well agreed aims of such a benchmarking. Nevertheless various (often high quality) special benchmarks are scattered through the literature.

During the last years efforts toward benchmark collections for symbolic computations were intensified. They focused mainly on the creation of general benchmarks for different areas of symbolic computation and the collection of such activities on different Web site.

For further qualification of these efforts it would be of great benefit also to access electronically the special benchmarks scattered through the literature. This would provide the community with an electronic repository of certified inputs and results that could be addressed and extended during further development.

Since symbolic computations usually lead to voluminous data as input, output or intermediate results one has not only to collect data but also to develop tools to generate, store, manipulate, present and maintain these data.

It is the aim of the SymbolicData project to develop such tools, to support users to run authorized benchmarks, and to provide facilities for storing, managing and presenting the collected data.

In a first stage we concentrated on the development of the main design principles that allow for the necessary flexibility and extensibility. A first Perl implementation is ready for use and was tested on collections of data from two areas of Computer Algebra: Polynomial System Solving and Geometry Theorem Proving.

Applications of our tools to collect data from other areas of symbolic computations are intended. For this, we seek the cooperation of persons and groups that have related data collections at their proposal, are willing to share them with the community and to spend some effort to translate these data into the proposed interchange format.


Main Goals

With this motivation in mind the SymbolicData project has the following three main goals:

It will provide a framework and general tools

  1. to collect data of examples from various areas of Computer Algebra
    • in a systematic and uniform way together with their solutions and other related background information;
    • in a form that conveniently allows to extend, manipulate, and categorize the collected data;
    • such that they can be extracted in a form readable by different Computer Algebra Software;
    • such that interrelations of the collected data can be specified;
  2. to run trusted benchmark computations on these data, i.e.
    • to prepare data for input to different Computer Algebra Software;
    • to set up, start, time, interrupt, and monitor the computations;
    • to collect, analyse, and evaluate output data from these computations;
  3. to present data from the collection and results of benchmark computations, i.e.
    • to extract and combine information from the data base in various ways;
    • to translate such data in other data base formats like SQL;
    • to create presentations in common representation formats, like HTML;


Main Design Decisions

The Data Base

To accomplish these goals we rely on an object-relational data base concept. For flexibility reasons we decided (at least at the moment) not to use one of the various data base programs as main engine but to keep the primary sources in an XML-like ASCII format stored/retrieved through Perl tools.

To achieve flexibility and extendibility of this primary data base all relevant informations about its tables are stored in ASCII format in a special directory META. Thus the data base structure may easily be modified or extended with any text editor and without Perl knowledge.

Since the Data directory can be specified with an option -Data, one can even manage different projects with different META structure using the same SymbolicData Tools.

Records of the data base are stored internally as a special Perl data type Record based on hashes of strings. This allows for flexible access to individual tag/value pairs of different record. They are tied to the file location of the underlying data base in a transparent way.

The string management facilities of Perl are well suited for tag value output in various formats.

For the evaluation of semantical aspects of records SymbolicData has to cooperate with software capable for symbolic manipulations. At the moment we use for such purposes Singular and MuPAD. With more experience an interface will be specified such that also other CAS can be used as underlying Computer Algebra engine in the future.

Standard data base programs allow for much more flexible navigation (sorting, indexing, combining different information) in the underlying data pool. SymbolicData provides an interface (solely ASCII based at the moment) to SQL that allows to define, create and update different SQL tables derived from the primary data base. In particular, all interrelation information contained in the primary data base may be extracted to SQL relation tables.

Computations

SymbolicData's compute environment rests on the GNU time program that allows for flexible monitoring of processes on various computer architectures. Such a common interface defines well specified conditions for benchmark computations such that run times and CPU times become more comparable.

To set up a trusted computation the user has to extract the digital data from the primary data base, prepare them for input to the specified Computer Algebra software, create the corresponding input file, start and monitor the computation, and evaluate the output file. Due to the required flexibility SymbolicData provides only several tools to support the programmer of such a task. A more formalized compute interface to support the different stages of the benchmarking process is under discussion and development.

Hence to use this part of the SymbolicData software needs some familiarity with Perl programming. Several tables in the primary data base store information about different Computer Algebra Software and computations that were already used for trusted benchmark computations and can be reused by other projects.

Results of computations can be stored in the data base (in special tables) in the same way as examples.

HTML Presentation

Several tools support the presentation of the data of different tables (i.e., of both examples and results of computations) in HTML format. This is a scratch implementation at the moment and will be (hopefully) more elaborated in future releases.


Current State

In the first development stage of the project we concentrated on the general design principles of the tools and the data collection, thereby trying to achieve a balance between the necessary flexibility/extensibility on the one hand, and simplicity/practicability on the other.

A first application of our tools and concepts was realized on collections of data from two areas of Computer Algebra: Polynomial System Solving and Geometry Theorem Proving.

Further applications of our tools and concepts to collect data from other areas of symbolic computation are intended. For this, we seek the cooperation of persons and groups that have related data collections at their disposal and are willing to spend some effort to enter this data into the SymbolicData data base and provide the respective add-ons to already existing tools.


Some Remarks about the History of the Project

The SymbolicData project grew out of the special session on Benchmarking at the 1998 ISSAC conference in Rostock which was organized by H. Kredel. Since then, the project has steadily developed from ideas to implementations and data collections and back.

In 1999, the authors joint forces with the symbolic computation groups of the University of Paris VI (J. C. Faugere, D. Lazard), of Ecole Polytechnique (J. Marchand, M. Giusti), and of the University of Saarbrücken (M. Dengel, W. Decker). Furthermore, the project was incooperated into the benchmarking activities of the 'Fachgruppe Computeralgebra' of the Deutsche Mathematiker Vereinigung.

A first prototype was presented at the Meeting of the Fachgruppe Computeralgebra, Kaiserslautern, February 2000.

The first alpha release 0.4 is available since March 2001.