Many real-life systems in our everyday world consist of some network devices communicating with each other. Although it is a powerful source of data, people are having a hard time extracting any information from it. It becomes even harder if some automation of that process is needed. It is because most of this communication is being exchanged in the form of device logs or outputs.
Each device can have its own way of expressing itself, that is why it is hard to create universal solution ex. For extracting IP addresses from it. Extracting data from one log can be manageable by humans, but most networking systems use thousands of routers, etc. Furthermore, it is important to mention how hard it is to manage such huge systems. This article is supposed to show how the WASKO company approached these problems and helped the whole industry with its solution.
NEDAPS is an innovative and open platform used to support the management of distributed network and service infrastructure, developed by WASKO company. The system is dedicated to telecommunications operators (ISP, PSTN, VSP) or a company with extensive IT infrastructure. The use of NEDAPS significantly affects the simplification of operations “on the infrastructure” by manual users as well as BSS business systems, thanks to which the cost and time of implementing new services are significantly reduced. NEDAPS enables the graphic definition of actions (operations, configuration, exchange of information) to be performed on devices and systems in order to perform a specific task. The definition of the operation consists of selecting appropriate functional blocks from the provided libraries and their appropriate combination. Thanks to its simple and understandable graphical interface, it allows the end-user to manage networking systems very easily.
As mentioned previously in the introduction, the problem of extracting information from logs of network devices can be a hellishly difficult task. NEDAPS tries to help its end user with this problem. As logs are in the form of text data, there is a possibility of extracting data from them using regular expressions. A regular expression is a sequence of characters that specifies a search pattern. It is a technique developed in theoretical computer science and formal language theory.
After defining the regular expression, it can be used to extract information from texts of similar structure. Thanks to that, the user can define a regular expression for some set of devices and use the id inside of NEDAPS. NEDAPS also has a database of regular expressions created by professional programmers, so that users can just find already created expressions for their need. Such regular expression can be easily readable, and understandable. This solution has unfortunately two drawbacks. If there will not be any sufficient regular expression in the database, it requires from the user to be familiar with creating one. On the other hand, the user can employ a programmer specialized in creating regular expressions. The second problem is that regular expressions are created in the way to perfectly fit the data that user currently have. Because of that, if there was a software update for one device, it can change the way it is sending the information, the regular expression will stop working.
As mentioned in the previous section about log data extraction, such logs usually follow some concrete structure. It is not hard for a human to extract the information needed if he knows where to look for it. Such humans usually can formulate some rule on how to look for the part of interest. The thing becomes increasingly hard however when it comes to program such a rule and all exceptions from it. It is not impossible though and talented programmers can come up with a regular expression that extracts the information and does not break for unusual examples. This process is however pretty time-consuming and has to be repeated every time some change in output log form occurs. It also has to be performed for each individual case. That is why the Regos algorithm was created. It is a mix of different approaches based on probabilistic programming, genetic programming, machine learning, and exhaustive search. The results are artificially generated regular expressions that match the patterns normally invented and programmed by humans. The inputs for the algorithm are examples of log files and information about the positioning of parts of interest. The algorithm can generate regular expression from even one example however the accuracy and generalization rise significantly with a bigger number of provided inputs. REGOS is also capable of generating artificial examples based on these provided by humans, which in turn helps in creating more general regular expression. It is achieved thanks to data augmentation. Data Augmentation is a process of generating new training examples based on the ones provided by the user. Because of that regular expressions generated by REGOS are prepared for future changes in the logs.
REGOS system is fully integrated with NEDAPS. It can be used directly from it to generate new regular expressions. These patterns can be used in NEDAPS-MODELER, directly in networking schema.
We can use REGOS to visualize all generated regular expressions.
Also, we can train a new one, by providing some examples of text, and what should be extracted from it – which can be ex. IP adress.
Although REGOS is an independent system, that can work not only in NEDAPS and networking but ex. In extracting data from Wikipedia articles. It can be integrated into any other system, thanks to its scalability.
Thanks to the usage of AI, NEDAPS became a powerful tool in networking data analysis. It has not only the ability to use patterns in form of regular-expressions, but also generate them. Thanks to that, the end user does not need to be experienced in regular expressions and programming to use them. It also helps in situation when the form of logs changes and therefore update to regular expression is needed. In such a case, only few examples of new log output along with selections has to be prepared. The algorithm will then create or modify previous regular expression in a way that it matches parts of interest in both kinds of log format.