Meanderings on AUTOSAR model repositories (and other models)
When working with AUTOSAR models, model storage and management is topic to be solved. In this blog post, I coarsely discuss some thoughts – it is intended as collection of topics that projects using AUTOSAR models have to adresse, but it is no way complete and each of the issues introduced would fill a lot of technical discussion on their own.
Types of Databases
- Relational databases are often the first thing that comes to mind when discussing technologies for model repositories. However, AUTOSAR models (as many other meta-models for engineering data) are actually model elements with a lot of relationships - effectively a complex graph. Relational databases per se do not match well to this structure (see impedance, e.g. in Philip Hauer's post).
- NoSQL databases: There are a number of alternatives to the relational databases, known under the term "NoSQL", this includes key-value stores, document stores etc. But since the structure of AUTOSAR models is an element graph, the most natural candidate seems to be a graph database
- Graph databases treat nodes and their relationships as 1st-class-citizens and have a more natural correspondance to the engineering models.
Eclipse provides technologies to store EMF-based models in database repositories of various kinds. With CDO, it is quite easy to store EMF based models in a relational backend (H2, Oracle) and the combination is very powerful (in the IMES project, we run an embedded H2 (relational) database in Eclipse with CDO to store AUTOSAR models, as well as a centralized server to show the replication mechanisms of such a setup). However there are drawbacks (schema integration) and advantages (relational database products are well understood by IT departments). There are also projects that provide other CDO backends or more direct integrations with other NoSQL databases (such as Neo4J, OrientDB).
AUTOSAR defines an exchange format for AUTOSAR models based on XML. Using that for storage is an obvious idea and Artop does exactly that by using the AUTOSAR XML format as storage format for EMF models.Source code control Systems
Since the files are stored as text files, models can be managed by putting them into source code control systems, such as SVN or GIT.
One criterion for the selection of a specific technologies is performance.
While working with AUTOSAR BSW configurations for ECUs, during the days of 32-bit machines and JVM, memory limitations actually posed a problem for the file-based approach. In settings like this, setting up a database server with the full models and clients that access them is one approach. For Robert Bosch GmbH, we did a study in CDO performance that was presented at EclipseCon. In the end, for this particular setting it was considered more economic to provide 64bit systems to the developers than to send up a more complex infrastructure with a database backend. 64bit systems can easily hold the data for the BSW configuration of an ECU (and more).
Often it is argued, that with a remote database server, availability of the model is "instant", since no local loading of models is required. When looking at BSW configurations with tools like COMASSO BSWDT, it turns out that loading times are still reasonable even with a file based approach.
Specifics of AUTOSAR models
Dangling References / References by fully qualified name
In an AUTOSAR XML, the model need not necessarily be self-contained. A reference to another model element is specified by a fully qualified name (FQN) and that element need not be in the same .arxml. This is useful for model management, since you can combine partial models into a larger model in a very flexible way. However, that means that these references need to be resolved to find the target element. But there is a technical issue: The referring element specifies the reference as a fully qualified name such as 'A/B/C'. However, to calculate the FQN for a given element, you have to take its entire containment hierarchy into account. That means, that if a short name of any element changes, that will affect all the fully qualified names of its descendants.
File-BASED / Sphinx / Artop
Coming from Artop, the basic infrastructure Sphinx has support for that reference resolving. The proxy resolution is based on the fully qualified names and has some optimizations. Unresolved references are cached in a blacklist and will not be traversed until the model changes in a way that a retry makes sense. New files are automatically detected and added to the model. New versions of Sphinx even support improved proxy resolving with the IncQuery project.
While it was mentioned above, that it is easy to store AUTOSAR models in a database with CDO, this refers to fully resolved models mainly. Supporting the flexible fully qualified name mechanisms needs extra design by the repository developers.
In AUTOSAR, the contents of an element can be split over different physical files. That means that a part of your SoftwareComponent A can be specified in file x.arxml, while another part can be specified in y.arxml. This is also very handy for model management. E.g., as a tier-1, you can just keep your parts of a DEM (Diagnostic Event Manager) configuration in specific files and then drop in the DEM.arxml from the OEM, creating a new combined model. And when the DEM.arxml from the OEM changes, it should be convenient to just overwrite the old one. However, that implies additional mechanism for all persistence technologies to create the merged view / model.
File-BASED / Sphinx / Artop / COMASSO
There are various mechanism used in different tools for this:
- Artop has an approach that uses Java Proxies to create the merged view dynamically
- We have an implementation that is more generic and uses AspectJ on EMF models.
- COMASSO uses a specific meta-model variation that is EMF like (but slightly different), that allows the merging of splitables based on a dedicated infrastructure.
Additional design considerations have to be done for splitables. Possible approaches could be:
- Store the fragments of the split objects in the database and merge them while querying / updating
- Store only merged objects in the database and provide some kind of update mechanism (e.g. by remembering the origin of elements)
For the basic software configuration (BSW), the AUTOSAR metamodel provides meta-model elements to define custom parameter containers and types. That means that at this level, we have meta-modeling mechanisms within the AUTOSAR standard. This has some impact on the persistence. The repository could just decide to store the AUTOSAR meta-model elements (i.e. ParameterDefinitions and ParameterValue), thus requiring additional logic to find values for a given parameter definition - a similar problem exists when accessing those parts of AUTOSAR from code and solutions exist for that. Representing the parameter definitions directly in the database might not be simple - such dynamic schema definitions are difficult to realize in relational databases. Scheme-free databases from the NoSQL family seem better suited for this job.
The persistence technology will also have effects on the possibility to work with the data without a network connection to the repository. Although world-wide connectivity is continuously improving, you might not want to rely on a stable network connection while doing winter testing in a car on a frozen lake in the northern hemisphere. In these cases, at least read access might be required. Some repository technologies (e.g. CDO some DB backends) provide cloning and offline access, so do file based solutions.
Long running transactions
Work on engineering models often is conceptual work that is not done within (milli-) seconds, but work on the model often takes days, weeks or more. E.g., consider a developer who wants to introduce some new signals in a car's network. He will first start by getting a working copy of the current network database. Then he will start working on his task, until he has a concept that he things is feasible. Often, he cannot change / add new signals on his own, but he has to discuss with a company wide CCB. Only after that every affected partner agreed on the change (and that will involve a lot of amendments), can he finalize his changes and publish them as official change. The infrastructure must support this kind of "working branches". CDO supports this for some database backends and works very nicely - but it is also very obvious that the workflow above is very similar to the workflow used in software development. Storing the file based models in "code repositories" such as SVN or Git is also a feasible approach. The merging of conflicting changes can then be comfortably supported with EMF compare (which is the same technology as used by infrastructure such as CDO).
These offline / working branches must support saving / storage of an inconsistent state of the model(!). As the development of such a change takes quite some time, it must be possible to save and restore intermediate increments, even when they are not consistent. A user must be able to save his incomplete work when leaving in the evening and resume next morning. Anything else will immediately kill user acceptance.
Authorization / Authentication
Information protection is a central feature of modeling tools. Even if a malevolent person does not get access to full engineering data, information can be inferred from small bits of data, such as names of new signals introduced into the boardnet. If you find a signal name like SIG_NUCLEAR_REACTOR_ENGINE_OFF, that would be very telling in itself. Let's have a look at two aspects of information protection:
- Prevent industrial espionage (control who sees what)
- Prevent sabotage (invalid modification of data)
Authentication is a prerequisite for both, but supported by most technologies anyway.
If models are served to the user only "partially" (i.e. through filtered queries in a 3-tier architecture), it seems easier to provide/deny access to fine-grained elements. But it also involves a lot of additional considerations and tool adaptions. E.g., if a certain user is not allowed to learn about a specific signal, how to we deal with the PDUs that refer to that signal? We should not just filter that specific signal out, since it would actually cause misinformation. We could create an anonymous representation (we have done this with CDO's access control mechanism, showing "ACCESS-DENIED" everywhere such a signal is used), but that also has a lot of side-effects, e.g. for validation etc. CDO has a fine-grained access control on the elements mapped to the database (and I am proud to say that this was in part sponsored by the company I work for within the IMES project and Eike Stepper has done some very nice work on that). In addition, if you support offline work, you must additionally make sure that this does not impose a problem, since now a copy of data is on the disk - off-the-shelf replication copies the data and would have a copy of the original, unfiltered data (definitely, nowadays you would only store on encrypted disks anyway). One of the approaches would be to define as coarse-grained access control as possible, e.g. by introducing (sub-) model branches that can be granted access to in a whole. This would also be the approach to be taken for file-based storage. Access to the model files can be supported through authorization of the infrastructure.
Control data modification
Another risk is that someone actively modifies the data to sabotage the systems, e.g. by changing a signal timing. Motivation for such activation can vary from negligence, paid sabotage to a disgruntled employee. Similar considerations apply to this use case as to the read-access described above. In the three-tier-architecture, access validation can be done by the backend. But how about the file-based approach. Can the user not manipulate the files before submitting them? Yes he could, but a verification is still possible when the model is submitted back into the repository. A verification hook can be used that checks what has been modified. And with the API of EMF compare, these verification could be made on the actual meta-model API (not on some file change set) to see if any invalid changes have been made in the models.
AUTOSAR models can be stored in a number of ways. Using Artop and the Eclipse based framework, a number of options are available. Choosing the right approach depends on a specific use cases and functional and (non)-functional requirements.