When a Butler Becomes a Victim
Today we are going to depart from the classical cliché of second-rate detectives and will tell you about the case from our experience, when the butler himself became a victim and a search for criminals led us to an unexpected result. But don't be scared. Of course, not the real people, but the programs will be described. Why are we not scared to admit our own mistakes? We certainly know that not a single team is insured from making them.
Introduction
Let me start with a little bit of theory. We develop a PVS-Studio static analyzer for C, C++, and C# code. For internal testing of our product, we use a variety of tools and techniques: Unit tests and Visual Studio UI-tests, a joint review of code, specialized tester application. I would like to comment in more detail on the last point.
When developing new diagnostic rules, as well as when you change the internal mechanisms of the analyzer, it is always necessary to understand the way these changes affected the quality of analysis. For this task, we perform test runs of the analyzer on a set of large open source projects. For C/C++ code we use about 150 projects, and 52 projects are used for C#. A tester app for analyzing C/C++ projects under Windows is called SelfTester. Further on, we will talk exactly about this tool. There is also a specialized system for Linux, but for now it will remain behind the scenes.
All work is performed on a build server with Windows 10. As a build system we use Jenkins which is configured to run nightly on a regular basis, among other things, the testers. The run of Jenkins itself is performed from cmd-file with a command:
java -jar jenkins.war > %JENKINS_PROJECTS%\Logs\%YYYY%.%MM%.%DD%_%HH%.%MI%.log 2>&1
The JENKINS_PROJECTS variable specifies the location of the Jenkins project folder on the local disk. Variables YYYY, MM, DD, HH and MI contain the date and time at the moment of executing, respectively. Thus, after launching Jenkins, the output of its console will be redirected to the file with the name of the form:
2017.11.08_17.58.log
Cmd-file starts with the help of a Windows Task Scheduler. A launch of Jenkins as a Windows service turned out to be inapplicable for us, since in this case there were problems in the work of UI tests of Visual Studio: the service does not have a desktop. Therefore, the invocation chain looks like this:
Windows Task Manager -> cmd.exe -> java.exe
So, having read the short introduction, let's move on to the question of what went wrong.
Problem
About six months ago, we began to notice that Jenkins is closed during the night builds from time to time. As a result, it had to re-run be manually, we had to check the logs, and restart some tasks. The problem was fairly regular. The investigation was started with exploration of Jenkins behaviour, because it is Jenkins, which does all the "dirty work", and therefore is the first to fall under suspicion. Yes, that very butler.
Jenkins
First and foremost we checked the configuration and customization of Jenkins tasks. The check did not find anything serious. Moreover, if an error was made, for example, in the order, priority or start time of the tasks, then the problem would probably reveal itself in a different way. It is also unlikely that Jenkins will crash due to configuration errors.
As a temporary measure, a task of a Windows scheduler was reconfigured to restart Jenkins every 30 minutes. For protection against starting a second instance, a check for the existence of an already running Jenkins process was also performed.
The next step was to examine the console logs of Jenkins. As you probably remember, they are saved in the folder %JENKINS_PROJECTS%\Logs, and the file names contain a time stamp of creation (in fact - time of Jenkins restart). After a while, we have studied the contents of this folder and, as expected, found a fairly large number of files. It showed that Jenkins continued to restart. At the same time, Jenkins console logs did not contain anything criminal. The problem became somewhat different: a part of the nightly tests stopped being launched. They just were not starting. However, those that did, often terminated with errors.
All this spoke in favor of the fact that Jenkins was, probably, not to blame. Someone from the outside hindered its work, by "killing" the java.exe process.
But who is the murderer? Read more in the article - https://www.viva64.com/en/b/0546/












