11 min. reading time

Natural language processing (NLP) is a subfield of artificial intelligence (AI) that is being used in an increasing number of applications. These include making research in fields such as law and medicine more efficient, automatically extracting and retrieving information, and simplifying everyday tasks. In recent years, NLP has undergone a significant transformation thanks to a new set of algorithms known as transformer models, including the well-known BERT algorithm. These algorithms bring together the best features of previous neural network architectures and overcome their shortcomings, resulting in unprecedented advancements in language technology.

One area where NLP can be particularly beneficial is in requirements engineering, which involves managing engineering requirements. This is often a manual task that can be time-consuming, never-ending, and prone to errors. This can slow down project progress as people spend too much time on quantitative tasks instead of focusing on the less time-consuming qualitative tasks that are a more accurate measure of progress.


In this article we will discover how NLP can be effectively applied during the design phase of requirements engineering and how it can save businesses many hours that can be allocated to other tasks, increasing the velocity of projects, and mitigating the risk of failure. 

Transformer models in NLP, such as mentioned BERT, Google’s GPT3 or Nvidia’s T5 are the state of the art of machine learning for NLP, kicking off what is described by some as “the golden age of NLP'', being of similar importance to how ImageNet elevated Computer Vision to the next level. Natural languages are characterized by multiple types of ambiguity, teaching a machine how to parse them is therefore a challenging task.

Tonality, several distinct kinds of syntactic or terminological ambiguity, unclear semantics on the sentence level, lack of structure, scientific collocations, are all factors that a computer cannot comprehend. Around 80% of the available data on the web is unstructured data (in other words natural language text), a goldmine of information and untapped potential. To utilize this information, NLP algorithms are needed to extract the data, structure it, and identify semantics to gather information & insights from text data. NLP is the programmatic component of a larger field called computational linguistics. Computational linguistics is also derived from traditional linguistics (the study of language), and computer science.

Requirements engineering (RE) is a field concerned with defining, documenting & managing engineering requirements during the design steps of engineering projects, commonly used for systems- or software engineering endeavours. This includes requirements databases that can be managed by tools such as IBM Rational DOORS or text files such as specification documents, the latter being usually written in unstructured, natural language text. There is a combination of unstructured, semi-structured and fully structured documentation. Requirements can be contextually related to one another, as in one requirement may further specify, block, or enclose another requirement.

There are also different requirement types: They could be specified as software requirements, system requirements, stakeholder requirements or test requirements. There are often links between requirements of varying types, as in a software requirement can be linked to a respective test requirement, for testing the software requirements for instance. Establishing meaningful connections between these requirements is essential for the system or software being designed, but also for the project’s velocity and productivity. 

Improving Requirement Linkage Through NLP

Connecting requirements is usually a human endeavour, albeit a time consuming & frustrating one. There is also much room for errors, even for domain experts linking requirements, especially when labelling the relations between them. Assigning relations (such as ‘blocks,’ ‘encloses’ or ‘specifies’) to pairs of linked requirements is not always straightforward and is known to be a cognitively challenging task. The lines between relation types can be blurry and it takes the combined effort of several domain experts to guarantee a high labelling standard in managing and linking requirements semantically.

In supervised learning this is considered a labelling task, which is not necessarily known for straightforwardness, as the labelling choice of how to label a specific entry is not straightforward. Even experienced domain experts may run into decision-making issues during the labelling process. Assigning the task to only one domain expert would potentially reduce the shape of the ground truth data to the subjective judgment of a single person, due to our human cognitive biases we may not be aware of . And that is not what we want when conducting knowledge & information management of sensitive engineering data. When managing information and knowledge like this, teams may run an elevated risk of distorting information that needs to be precise, clean, and reliable enough to be used for training a machine learning model for instance.

For engineers it is essential to have a precise knowledge of all the requirements for implementing a system or software. But this can be very cumbersome. Requirements documents are extensive, complex sources of information. Wouldn’t it be much better to have some helper system which can parse all those documents and create a semantic representation that is both machine-readable and at the same time more digestible for project members? The result will certainly still have to be checked by domain experts, but far less time will have to be invested into managing requirements altogether. This can save a lot of people a lot of trouble. But then again, creating such a solution is a challenge of its own. 

In NLP, named entities in text (like people, organizations, products, etc. or other domain-specific variants of entity types) are found by training a model on a task called NER (named entity recognition). These named entities are then connected by training another NLP model on a task called Relation Extraction, one of the core challenges of NLP. The two connected entities alongside their relation can then be stored as an information triplet, by storing the entities as nodes and their relations as edges/vertices. This technique can be used to bootstrap a graph representation of information found within text and transform information represented in textual form into a graphical representation, an end-to-end approach with minimal human intervention.

NLP usually applies the approach on the term level, for example creating the triplet Tim Kook (PERSON) – CEO_of – Apple (ORG) from an example sentence ‘Tim Kook is the CEO of Apple.’ The tags ‘PERSON’ and ‘ORG’ are automatically assigned to the nouns within the sentence by an NLP model that has been trained to perform named entity recognition with given labels. 

The Use of Transformer Models in Requirements Engineering

In requirements engineering however, our problem set looks different: We often deal with entire sentences being the smallest possible information entity, usually a brief textual description of a software- or system requirement. For many use-cases thereof, we do not want to decompose the sentence.

Although it makes sense to tag every word in the sentence with its linguistic features with an NLP system, for requirements engineering it is about setting up connections between these sentences, or requirements descriptions, which is a fundamentally different task from a machine learning point of view. Therefore, a lot of the of the standard approaches in NLP are not going to help us get the curve.

By using transformer models, we can use a pretrained model that can quickly get a quite comprehensive understanding of the input text, which can be also used for inference on domain-specific text as found in requirements engineering or legal documents, to predict a numerical representation of the incoming text. These numerical representations, called embeddings, another word for vectors, can now be used for more advanced AI tasks.

My findings at Itemis AG have shown that even for many domain-specific scenarios such as requirements engineering, the models seem to perform quite well in representing the semantics of the input data. 

A defining characteristic of transformer models though is that they can be further fine-tuned for more specific downstream tasks in NLP, such as question answering used in chat-bots and search, text summarization/generation or next-word prediction or machine translation. This approach of adding a more task-specific supervised stack on top of the output of a more task-agnostic unsupervised stack is known as transfer learning.

In this case however, the more specific downstream tasks known to be used with transformer models cannot help us directly with doing what we intend to do. We also cannot create a preliminary model for named entity recognition, as we are dealing with entire sentences treated as single entities (as each of these sentences are descriptions of a specific engineering requirement), a quite unusual way of representing data or information within NLP as we know it.

What could be done is to develop a new type of standalone classifier for instance, which can utilize the data representation inferred by the pretrained transformer model on our own custom-defined downstream task handled by a different, custom model instead of a fine-tuned transformer model. This will give us more decision-making power regarding the nature of the model. All we need for doing this is the output of the transformer model we are using on our input data, as an input to our own model.

As mentioned, transformers create word embeddings (or word vectors) to represent words in the input text with a state-of-the-art dynamic approach that can be mapped into vector space. But we want to represent entire sentences as vectors instead of words. It turns out that we can use an approach to create a summary sentence vector representation from all word vectors within a sentence to accurately represent the sentence, in this case an engineering requirement, as a vector representation that can be repurposed for many future deep learning endeavours. 

One of the key innovations with models like BERT is that instead of randomly generating initial vectors for every input instance and using these as input to train a classifier, we create a much more accurate preliminary, trained representation to be used for training a new classifier, improving model performance at inference time. Transformer models are also more dynamic and sensitive to context compared to the preceding deep learning models such as RNNs, and the input embeddings have a more elaborate architecture that store more information about the text they represent, compared to word2vec’s static embeddings, which is why these vectors are commonly referred to as contextual word embeddings. 

Once the system has computed the embeddings for our requirements texts, we calculated semantic similarity scores between each vector pair, by combining every vector with every other vector except with itself.

Assume we have a database with ten thousand requirements. Without AI, engineers would have to manually connect or group requirements together by hand. Now we have a system that computes the embeddings for each requirement and by means of efficient matrix multiplication, stacks the embeddings, and computes a tensor representation, which holds the semantic similarity scores for all combinations of embedding pairs.

Just think about combining ten thousand unique engineering requirements together pairwise, this entails arduous, time-consuming manual labour. It is impossible for a human to connect all requirements together and then calculate similarity scores for each pair together retrospectively, to decide which requirements may or may not belong together contextually, within a matter of minutes, and then rank them based on the scores! 

As mentioned, we have learned that we could represent the output similarity scores as tensors, by using pytorch’s tensor data structure and then extracting the related values to each pair and store them as unified information pieces. The individual scores for each requirements pair are kept as single-value tensors to enforce numerical stability. When type-converting the scores to normal numerical types such as a float, high-precision values may otherwise experience numerical underflow.

By using robust Pytorch tensors, we can keep our calculations robust. In the case of exceptionally large requirements databases, calculations may take a while depending on the data, so we store our vectors to only calculate them once, on a cuda-enabled GPU if necessary. The NLP system we are building, which so far makes use of a conglomerate of AI algorithms across its system components, also seems to perform well on a CPU, but that depends on how much client data it needs to process. 

It’s an exciting journey, to build a system that can read all your documents and represent the key information as a digestible data structure of interconnected information pieces, like an auto-generated, interactive mind-map that could also be visualized, where things are logically connected to one another, instead of scouring through endless requirements documents just to do tons of work so that your team can start with the actual project tasks in first place.


Be prepared for our future posts, where we will talk more about NLP, including posts that are more sensible to business professionals, to draw a big picture of NLP for anyone unfamiliar with any of the mentioned engineering concepts. 

If you enjoyed this article or want to share your opinion, leave us a holler in the comments section.