A Proposal For Openai Core Services

Thomas Weber

Table of Contents

1. Introduction

2. Preconditions

3. Overview

3.1. Persistence
3.2. Modules
3.3. Settings/Parameters
3.4. Properties
3.5. Project files

4. Details

4.1. Persistence
4.2. Modules
4.3. Settings
4.4. Properties
4.5. Project files

Note

This proposal is subject to change. I want to bring some issues to your minds. Read it, think about it, discuss it and let me change it afterwards ! Last change: 05/12/2002

1. Introduction

As we aim to build a multi purpose framework, we need to define and implement some core services for the project.

In this paper i will try to tell you about my ideas of the different parts. It should not be a general roadmap nor a how-to-do-it but a base to start discussions about this kind of stuff.

2. Preconditions

Before i start to write about the core of OpenAI, we should review the actual project structure (as of 04/2002).

At the moment, OpenAI is nothing more then a bunch of unrelated, independent CVS modules (sorry, but someone had to say this :) ) I suggest to decompose the java-neuralnet project into three new modules:

Refractoring java-neuralnet

GUI: OpenAI's GUI should become a general tool for all kinds of modules (refer to the GUI ToDo). Therefore it may not depend on a single module (other then the core) nor should be contained in the packagehierarchy of it. A first step (before modularizing it further) is to setup a new package net.openai.ai.gui and move all files from net.openai.ai.nn.gui into it.
core: To bring it finally to the point: the core classes needs a own package (and maybe its own CVS subproject, also). We can start by creating a package net.openai.ai.core and moving the package net.openai.ai.nn.persistence into it. Some parts of the GUI's propertysystem should go also into the core package. (core.property)
neuralnet: Whats leftover will be the beginnings of the java-neuralnet module.

Now that we have a packagestructure to hold things, we should discuss a bit about the organisation of libraries. Every OpenAI module will need some stuff from the core package. If we decide to start a own CVS subproject for the core, we could have a ANT buildfile to compile and package a AIcore.jar library. A disadvantage of this strategy is the fragmentation of the (super-)project. To run a module, a user would need
the module (of course)
AIcore.jar
librarys needed by the core (like castor, log4j, xerces)
other librarys (depends on the module)
Thats bad because we will certainly run into some dependency problems. Anyhow, it needs to be maintained.

Important

I'm awaiting some comments regarding this topic !!

to be done

Definition of a directorystructure guideline for the upcoming modules.

3. Overview

There are several utilities and services we will need across all parts of the project. In this section you will see a short description of the services i can think of. Later sections describes them in-depth.

The architecture suggested by this proposal would look like this:

3.1. Persistence

We need to store and retrieve arbitrary kind of data. But we can try to classify diffent types of content:

Object types

Documents: A document is a single datamodel for a specified module. This can be a neuralnet or a dataset for example.
framework data: The system itself needs to store and retrieve data. One type of data will be the settings. (covered in another subsection).

Although we have different datatypes to store and retrieve, i highly suggest to use one on-disk format to handle them all.

As the time changes, dataformats come and go. At the present,XML is the thing everyone wants to have. I recommend to use XML as our only and native dataformat. The main advance is to have a whole bunch of well tested tools to create, parse, validate and use XML datafiles.

For now we've used castor and i think its a easy to configure and very usefull API to achieve transparent persistence of Java objects. Castor can do a straight forward mapping between java and XML. To use Castor, one needs to define a mapping file that tells Castor how to store the Object.

This brings up one big disadvantage: One has to take care of the mappingfiles when a classfile changes. Because the compiler isnt aware of this external resource, no error is thrown at compiletime. Therefore its rather unsafe to do changes without crosscheck them with either manual- or (better) JUnit-tests.

Changing a mappingfile pushes us ahead to another problem: When the mapping changes, the on-disk XML-schema changes also! Objects that have been stored to disk cant be retrieved anymore with the 'new' version. How to solve this? I dont know... maybe you do? Though this isnt a big deal for now, we will stumble over this when the time comes for the second release.

Anyway, we need a persistence layer. To make things more flexible, we should aim to build a format independent interface. Even if we stuck to XML for now, we dont know what the future brings. A good design can save us lots of headache.

3.2. Modules

By the first thought about modules, one might think of extensions for the GUI. I would go one step beneath the GUI. What we really need is a modulesystem on the API/framework level. The GUI should be nothing more then a `builder` or (better term) `development environment` for a AI. So the API must be able to chain together different modules without the GUI! Tho the GUI can and will have its own modules.

Module classes

API Modules: These are implementations of different AI types (Neuralnets, Agent Systems, GA, FSM)
UI Modules: Most of the modules will need a corresponding UI-module to be usable in the OpenAI-IDE (Neuralnet TopologyPane for example). There can be different implementations for the same API-module. (i.E. different DataSet editors)

3.3. Settings/Parameters

Each part of the system will neede some settings at runtime.

We can classify those:

Classes of settings

Developer settings: These are values that are important mainly to the developer. A user dont need to know anything about it. The advantage of using constants in a settingsfile rather then compiled ones (public static final ... ) is that the developer can play around with them. He dont have to recompile some parts of the sourcecode.
User settings: When one uses the GUI (and/or the API), he wants to have a convenience way to both change and store settings.

Both classes can be devided further into scopes:

Scopes of settings

Volatile

Not every value needs to be stored on disk. Although we dont have to bother with this type of settings because the developer has to take care of them (classmembers) i wanted to notice them here for completenes.

Application

The applications/API of the framework will have several settings that are independent of the actual project the user works on.

Examples

Window positions/sizes
Keybindings
Path`s for several filetypes
Defaults for project scope settings
Defaults for wizzards

Project

One level beneath the application scope is the project sope. Here are the settings that are visible to all modules/PlugIns in one projectfile.

Module

As 'lowest' scope we can see (at API level) are the module's settings. These are values for a well specified, single module.

Examples

Zoomlevel in a GUI`s module
Random-seed for the neuralnet module

3.4. Properties

Properties are in a close relation to the settings issue. But there is a difference: I considersettings to be values in a higher level in the API. Settings are used as (at least) module wide properties.

The 'properties' i'm talking about here are the values that are directly in the datamodel of a OpenAI module. They are the neurons, connections, NetworkIterators, weights that make together that what we can call a 'document'. So we got finally to the bottom layer of the system.

At API-level we dont need any special interface to change the values. They are stored as normal classmembers and accessed by (mostly) bean accessors (setXxxx/getXxxx).

However the GUI does need a component to both show and modify document properties. This is the (already implemented) PropertyEditor. Later in this paper you will read some details about both capabilitys and usability of this piece of software.

3.5. Project files

This stands in a close relation to persistence. The user and also the developer wants to be able to store related documents together in one file.

This is not as easy as it might seem at a first thought, because each module will bring in its own, personal format. Also modules can depend on other modules Therefore we can get a whole tree of documents for each module. The GUI complicates this topic even more because it needs to store settings of a module, also.

My suggestion to this topic is to have a modular projectfile. Its main feature would be the ability to store a tree of subprojects of arbitrary complexity. The API has to be smart enough to allow a single module (in stand alone mode or in a user application) to grab its parts out. This creates the possibility to build/train/maintain a project in the GUI and use it in a target application and avoids the reason separate GUI and API projectfiles.

4. Details

In this chapter i will show you some approaches we could use to achieve a part of the functionality i`ve defined in the former section.

4.1. Persistence

As extension to the current implementation, i suggest a interface to mark classes that are persistence-able.

For a first implementation, the following approach could be sufficient:

interface PersistAble {


	/**
	 * Register a persistencehandler for this class.
	 */
	public  void registerPersistenceHandler(Persistence handler) throws PersistenceException;

	/** 
	 * Gets a new instance of this class from a persistent source 
	 * using the given url and the registered persistencehandler.
	 */
	public  PersistAble retrieve(URL url) throws PersistenceException;

	/**
	 * Gets a new instance of this class from a DOM Element using the registered
	 * persistencehandler.
	 */
	public  PersistAble instanceFromElement(Element xml) throws PersistenceException;

	/** 
	 * Serializes and stores this object to the given url
	 * using the registered persistencehandler.
	 */
	public void store(URL url) throws PersistenceException;


	/**
	 * Serializes this object to a DOM Element using the registered 
	 * persistencehandler.
	 */
	public Element toElement() throws PersistenceException;

}

This gives us some advantages compared to the current implementation. At the moment, a class called persistence is the center of the xml subsystem. It is able to store and retrieve objectinstances based on a castor mappingfile. A module can bring its own mappingfile with it and it needs to register it in the persistencehandler. This in turn loads and caches the mappingfile to gain speed and to save some typing whe none wants to store a object (the path to the mappingfile can be skipt). With the new implementation, a user/developer must not know about mappingfiles or even the persistencehandler. He simply calls (for example)

network.store("file://mynetwork.xml");

to store the object network (which has to implement the PersistenceAble interface) to the file mynetwork.xml. You might ask: "Why using a URLs instead of a filename?". Future implementations of the persistencehandler could have different serializer backends. So it might be possible to have a URL of the form jdbc://connection/...., ftp://.... or even agent://.....

It is neccessary to call registerPersistenceHandler(...) to initialize the PersistenceAble->Persistence binding properly. A good place for doing this in a transparent way would be the module classloader (see later). The implementation of the register method has to take care of different serializer backends and the needed settings itself !

There are 2 methods i have not explained yet:

public  PersistenceAble instanceFromElement(Element xml) throws PersistenceException;

and

public Element toElement() throws PersistenceException;

Perhaps a own interface would be cleaner for this two methods. However, those stand in a very close relation to the persistence subsystem. Its the link between the persistence- and the projectsystem. You can read more details in the project files chapter.

4.2. Modules

A universal modulesubsystem will need a central repository to get new instances of each module. The framework must be able to load and recognize modules at runtime.

4.2.1. Custom Classloader

The core of the modulesystem would be a specialised classloader. It should be possible to use different sources for modules.

Sources for modules

JAR's: This will be the most popular form of a single module. A jarfile can include all parts (classes and resources) of a single module.
URL: When the system becomes bigger, it might be interesting to have a modulerepository that can be accessed by URL`s.

To gain maximum flexibility, the classloader itself should be pluggable. So we can start with a simple JARModuleLoader and extend the modulesubsystem by other loader`s later. A interface should be sufficient to achieve this. The different classloader implementations have to be stored in a handlerclass. This handler is able to decite wich classloader to use for different requests.

4.2.2. Module repository

Every loaded (and checked!) module needs to be registered at a central modulerepository. If one needs the functions of a specific module, he has to get a instance for it from this repository. This could be both a existing instance or a new one !

4.2.2.1. Dependency

How to do dependency`s ? It is possible that a module needs another module to work properly. The repository must be aware of these fact! Also the classloader has to know about dependencys and how to check/resolve them. But not only modules can be part of a dependency rule: we have to take care of external libraries/API's, too. A module is only valid when all dependancys are solved!

4.2.2.2. Definition of a module

This is the toughest part. We need a easy but powerfull way to define modules. I`m not sure witch is the perfect solution for this problem. By a first thought it might be sufficient to use a bunch of interfaces. But maybe a XML-descriptor is better: When the classloader finds a jar-file, it could search for a specific file in it (i.E. module.xml>). This enables the system to get knowledge about available modules without actually loading them! The xml-file can have several sections like module name and type, capabilitys and of course dependencies or even documentation. I`m looking forward to your suggestions ! IMHO it is absolutely critical to find a good solution here.

4.3. Settings

Well. First question of all: do we need a setting management that implements all of the different scopes and classes i defined in the overview section?

To be honest: I don't know it! Tell me about your opinion!

4.3.1. Settingmanagement

two opposed approaches to deal with settings

Distributed: The settings are kept and handle in the code that needs them. Thereby each module or even class would need methods to collect/distribute settings from/to the persistence subsystem. This could be obtained by defining and implementing a set of interfaces.
Centralized: We would have one more repository to hold the settings of every participant. It could take care of scopes and provide a set of utilityfunctions for settings (clone/filter/store/retrieve settings). Nevertheless we need some interfaces for a eventsystem.

4.4. Properties

I'm proud to say: this is implemented and ready to use. Even though you should know of some in-depth details about the current propertysubsystem. I'm pretty sure you have some comments and suggestions regarding it. Review the following classdiagram to make you familiar with this subsystem:

4.4.1. Capabilities

The main job of this subsystem is to present a defined set of classmembers to a user to let him either review or change them. The difference to a ordinary beanbrowser is the way my system can be controlled by a descriptionfile. Instead of letting the User change all public properties of a javabean, one can hide some. So the user can be guided to the properties that are of interest and avoid the change of some variables that might be important not to the user but the system (ID's for example). Another goody is the possibility to declare and use selectionlists (dropdown lists). However these lists can be used only to select a objecttype in a dropdown-control. (NetworkIterators or ErrorTypes for example).

4.4.2. Propertydescriptor

As sayed before: one needs to define properties. When the GUI initialises, a file (conf/properties.xml>) gets readed and unmarshalled by castor. The result is a objecthierarchy that can be used by the propertysubsystem. The syntax of the file is very easy and can be learned very fast by simply reviewing it.

Here is a fragment of a descriptorfile. It shows how one can define a selectionslist for a dropdown field and how to use it in a propertylist.

						
						
<!-- Defines a selectionlist named 'Transferfunctions' with 3 elements -->

<selectionlist name="Transferfunctions">
	<entry name="" value="{NULL}"/>
	<entry name="Sigmoid" value="net.openai.ai.nn.transfer.SigmoidTransferFunction"/>
	<entry name="TanH" value="net.openai.ai.nn.transfer.TanhTransferFunction"/>
</selectionlist>
		
<!-- Propertydefinition for the class 'Layer' (refers to Transferfunctions) -->

<class name="net.openai.ai.nn.network.Layer">

	<property name="ID" get="getID"/>
	
	<property name="Name" get="getName" set="setName"> 
		<info>Name of this Layer</info>
	</property>
	
	<property name="Ready To Learn" get="readyToLearn"> 
		<info>Tells if the layer is ready to learn</info>
	</property>
	
	<property name="Learning rule" get="getLearningRule" set="setLearningRule" select="Learningrules" type="class"> 
		<info>Rule to use while learning</info>
	</property>
	
	<property name="Transferfunction" get="getTransferFunction" set="setTransferFunction" select="Transferfunctions" type="class"> 
		<info>Function to transfer the values to the next layer</info>
	</property>
	
	<property name="Inputfunction" get="getInputFunction" set="setInputFunction" select="Inputfunctions" type="class"> 
		<info>Function to summarize the weighted inputvalues</info>
	</property>
	
	<property name="Neurons" get="getSize">
		<info>Amount of neurons in this layer</info>
	</property>
	
</class>

4.4.3. Coded properties

Note

Not yet available.

It would be a nice feature to be able to define properties in a second way: directly in a java class. This is easy to achieve: all we need is a new interface and a slightly modified propertyhandler: The interface could look like this:

interface PropertyProvider {

	/**
	 * Returns the properties for this class
	 */
	public PropertyList getProperties();

}

A Class implementing this interface has to provide its own propertylist. By avoiding to cache the resulting list, we get a way to define dynamic properties !

4.4.4. API usage

Its pretty easy to display the properties of a objectinstance: all one has to do is to call the static method

AIDesktop.displayProperties(Object yourobject);

Everything else is done by the subsystem itself.

4.4.5. To be added

Inheritance: Tho the system is capable to take care of upward inheritance, it lacks support to inherit definitions in the descriptorfile. However, if you hand a instance of object "a" to the system and only a description for "b" is available (where a extends b), the Editor uses it. So you are able to define the properties of a interface/superclass and all implementations properties are accomplished, also. If one of the implementions has more properties than the superclass/interface, you have to redefine the whole class, thats the catch.
Auto-discovery: It would be usefull to have a 'auto' mode for simpler classes. For now, one is constrained to a propertydescriptor. The system could make use of reflection to discover objects itself. (In the actual implementation, reflection is only used to get/set the values in a instance)
Class-selector: To show a list of classes that implement a certain interface, one needs to define a selectionlist in the propertydescriptor. The classloader of the modulesubsystem could build dynamic lists for the propertyhandler. This way one would not need to extend the propertydescriptor anymore if a new implementation of a choiceable object is available.
Celleditors: One should be able to define its own celleditor for 'special' objecttypes.

4.5. Project files

We need some sort of directory to model a project and let the needed modules connect to it. Its almost certain that a project will hold more then one instance of a documenttype. This brings the need for a naming system within the tree/directory. Having the GUI in mind, i would suggest to use simple strings to put a name to each document (=node). That gives us a nice treestructure to display in (example) the left borderpane of the mainwindow ("explorer-like").

The biggest difficulty regarding projectfiles is the following question: "How to make a mappingfile with unknown complexity and unknown modules". Answer: I think there is no need for it! As you can read in the persistence section, each persisteable document has to know of its own mappingfile. To put things together to a bigger structure, we can use a ordinary DOM tree. When the system has to store a projectfile, it traverses each hooked ("mounted") module and asks for a DOM-Element holding the already serialized data and saves the resulting structure to disk. Lets twist: we want to retrieve a project. What we get is a DOM-Tree. The project system needs the ability to decompose the tree and to reconstruct the moduleinstances(!) with the unmarshalled documents. This will be a really tough part and involves a whole bunch of coreservices. However: things are cut into small, well defined peaces. Each module will be responsible for its own data. Failing of one module (i.E. versionconflict, missing module, missing mapping) cant destroy the whole project !

Lets review another demand of the definition: “ the API has to be smart enough to allow a single module (...) to grab its parts out. ”. No Problem! We have a project directory (aka DOM-Tree). The project-API will have a method to get a named document (=DOM element). The moduleclass can use the persistencehandler to turn this element to a new instance of the module (holding the unmarshalled document)! For convenience it might be handy to have a additional method to get the first (n'th) document of a certain type back from the projecttree.