10 min. reading time

Requirements management can be a challenging and time-consuming task in large systems-engineering projects. From elicitation to management, setting up a quality requirements framework can take more than a year. However, even after finalizing this essential phase, issues in requirements management can carry over to subsequent phases of the development life cycle, which can result in additional resources needed to fix them. Pulitzer-prize winning IT Consultant and author James Martin found that 56% of errors in systems engineering projects originate during the requirements engineering phase, and the cost of fixing them rises exponentially when addressed during subsequent project phases. This applies to the number of employees needed to fix those issues in a timely manner, the overall time needed to finalize the project, and the amount of refactoring required to fix the code.

Depending on what type of project workflow is used, this issue is especially prevalent in more traditional waterfall-based setups. The drawbacks in agile workflows, such as scrum, seem to be leaps lower compared to waterfall workflows. However, some of these statistics still do not use the right metrics to properly measure and define success. 

Figure 1: Benchmark for Requirements Work on large projects

Source: https://www.jamasoftware.com/requirements-management-guide/requirements-gathering-and-management-processes/how-long-do-requirements-take

At itemis, we have been thoroughly investigating these issues and thought of ways to solve them. In this article, I will talk about 5 ways NLP can be used to solve requirements engineering challenges, based on applications we have implemented and tested. We have then evaluated the performance of those applications and developed a working solution to address some of the known challenges. 

Requirements engineering demands a high level of quality and quantity to ensure project success. However, the cost and effort needed to fix issues are much cheaper during this phase than in subsequent phases. With systems moving towards the cloud and AI models as components, meeting industry standards has become more challenging. Fortunately, NLP, a subfield of AI, can help. Since engineering requirements are usually written in natural language text, NLP has matured significantly with the dawn of transformer models, making it easier to implement innovative solutions to assist requirements engineering. It is more important than ever to guarantee a high standard in requirements engineering as new technologies add complexity to many systems engineering projects, further increasing the number of requirements needed on average.

Without further ado, let's dive into the five ways NLP can benefit systems engineering.

1. Requirements classification

By using machine learning techniques (including deep learning), a model can be trained to automatically categorize requirements. In requirements engineering, we deal with several different high-level categories:

The first one is the group of functional requirements, which could entail more specific categories such as user interface or communication.

We then have the non-functional requirements, which could contain sub-categories such as security, availability, and usability. 

We also have fine-grained topics, which are very specific to a particular domain and the specific system we are building for the domain. In automotive, this could pertain to braking, speed, or automated wiper washing, whereas in robotics this could pertain to certain behaviors of the robot in certain circumstances. 

In order to give our requirements, the specific category tags they need, a machine learning model can help with automating this process, which otherwise could take weeks or months to complete when done manually, depending on the scope of the project. With the new models available in the field of transfer learning, such as BERT and Google T5, we can build highly accurate classifiers that apply the weights of a pre-trained language model, which has been trained on very large corpora and therefore have an accurate general understanding of the language it deals with, to a specific downstream task such as classification. This new family of models has enabled the field of NLP to reach a new level of technological readiness in industry settings. 

2. Requirements Defect Detection

When eliciting and consolidating a large set of engineering requirements, we always run the risk of not specifying our requirements in a clear, concise, and well-formed manner. This can lead to a large number of issues later during development. Reviewing & editing of such requirements is therefore a very time-intensive undertaking. We also deal with issues of ambiguity in natural language, which of course also applies to writing requirements. If there is any form of semantic ambiguity in a requirements sentence, the developers who need to implement the code for a specific requirement run the risk of creating a piece of code that does not meet up with the original intentions of a requirement. There are a few advancements in NLP now that can help with disambiguation. Previously mentioned transfer learning models (commonly referred to as transformer models), which are the state of the art in deep learning for NLP, can help with that. 

When doing defect detection, there are four main causes of concern:

The first one is ambiguity, which occurs when a requirement can be misinterpreted due to an imprecise writing style.

Then there is vagueness, which occurs when a requirement lacks a missing piece of information.

There is also the issue with weak verbs, the largest set of verbs in the Germanic languages, which cause the requirement to be imprecise. By making sure we use strong verbs, we can keep our requirements specifications clear. 

Last on the list are passive forms. It is highly advised to specify sentences with verbs in active form. This will make the requirement more concise and clarify the intention better.

In order to solve these issues, a two-step approach is required: First, an NLP system needs to identify those issues. This can be done by using a classifier that analyses sentences and tags the section with aforementioned issues with either one of those labels. The defect-labeled requirements can be then passed on to a requirements analyst to fix them more quickly. This saves a lot of time on the initial defect identification stage as it is now automated, making it easier to improve the language of engineering requirements. 

3. Information Extraction

Next, we have information extraction. Information extraction (IE) is the process of creating structured information from unstructured natural language text. One popular feature of information extraction is named entity recognition, or NER. NER uses machine learning in order to identify mentions of real-world entities within text. NER models can be trained on domain-specific data in order to identify custom entity tags within a given scope. So, this technique can be used in requirements engineering as well. We can also use IE in order to automatically extract a full glossary of terminology from our requirements texts. These processes can help us during the analysis of our requirements and in order to identify a structure of our requirements, which can later serve us with creating a digestible visual model of our requirements. At itemis AG, I have already implemented such a system, where a user can feed requirements text into a processing pipeline and receives a relational map, essentially a file consisting of nodes and edges, with every node being a requirement along its semantic properties and with every edge in between being based on  semantic relatedness of the two nodes. The file can then be visualized by our in-house graph visualization tool. This visualization can help during project management, requirements management, and the development increments of the project. This is a state-of-the-art, end-to-end information management system that requires no human intervention, helping us to structure development projects, and more rapidly plan Epics and Sprints in an agile workflow, by grouping requirements together that are highly related based on their semantics. 

4. Information Retrieval

Information retrieval is the task of querying information and receiving results based on that query. It can either be developed as a standalone system or one that complements the analytic data that was generated via information extraction. With IR, a developer can query a requirements database and use the results from IE to augment what is shown as a search engine result to the developer. With this combined approach developers are enabled to do data-driven decision making. It can also help with merging user requirements with an existing IE-powered database in order to quickly create tailored products for the particular user. In order to add to our NLP pipeline service at itemis AG, we are planning new services that use the output from our pipeline to create computationally efficient, meaningful and task-related graphs based on a query, by applying methods from vector-space modeling and semantic search to our service portfolio. Developers often don’t need a fully-fledged graph consisting of all requirements. Often, they only need to see a specific subset for their current task. This approach should help them with achieving that. 

5. NLP-based Requirements Traceability

Requirements traceability consists of the tracking of requirements across the entire system- or software development life cycle and connects each requirement to all its artifacts in the different development phases. This helps with minimizing risk of failure and maximizing productivity. It also helps with connecting engineering requirements to regulatory requirements, which helps project members stay in line with non-negotiable regulations without having to sacrifice for project velocity and without adding much additional refactoring work. By using NLP with methods from graph analysis, the links can be established automatically and then passed on to a requirements analyst for confirming or editing trace links. This takes a huge load off the analyst because NLP can help identify the deeper semantic structure of inter-related requirements artifacts, further reducing the amount of time that needs to be invested into the requirements engineering phase. Our in-house NLP pipeline is capable to trace requirements back to their original source after the requirements are stored in an information index. It also creates a vector space model to model the semantics of the requirements in order to compare them and create a knowledge graph based on relatedness. 

In one of my previous articles, I mentioned the development of an NLP system we have been building in the past year, which you can find in this articleif you want to have a sneak peek of what we are working on. There are many new features we have worked on since then. What the NLP pipeline and its new features can do for your business will be discussed in an upcoming article. 

There are many more challenges in requirements engineering and we are currently widening our investigation spectrum in order to improve our applications and create new ones, each being able to solve one part of the challenge. Our aim is to at some point add these applications together into a service family that connects to machine learning, machine learning operations and new internal developer tooling in order to provide highly scalable, customizable services in a foreseeable timeframe, to a diverse scope of clients.

So tune in for my upcoming blog posts if you want to stay up-to-date on our research endeavors in the field of NLP and machine learning for requirements engineering.