This article is a spiritual continuation of A Tale of Installers. We won't talk about installation processes specifically here, but rather on tackling the problem of making sense of A LOT of automated reports, namely of failed installations. On our way we’ll talk about search engines, using NoSQL and other technologies, and coming up with something useful beyond its original purpose.
As we've seen previously, an installation process can fail due to a very wide range of reasons and predicting them all is, in practice, impossible. From having Skype blocking port 80 (and disabling your IIS) to people who've encrypted their hard disks (thus preventing SQL Server from installing), the possibilities are endless.
With the release of our Community Edition, we started having thousands of installations and getting reports of installation failures due to unexpected conditions. And although we knew we couldn't shield the installer from all the edge cases, we attempted to automatically solve the most common situations and present useful messages for the remaining ones.
Looking at all error reports, grouping them, and identifying the largest group, is a task better done automatically. But this isn't something that can be done with a simple database query:
In the end, this problem ended up having a somewhat simple solution: “all” we had to do was to build a custom search engine!
Most search engines' architecture can be split into three stages:
The crawling step is usually easy. If you're crawling the web, you probably have to deal with issues such as duplicate content or never-ending automatically generated content. In our case, since we're crawling a limited and well-defined bug report database our task is comparatively trivial.
Indexing content is usually the trickiest part. When doing this, you want to filter out the uninteresting information and keep track of the interesting stuff only, and that usually requires some intelligence regarding the format of the information. If you're crawling HTML documents, you might want to strip the HTML tags and leave just the text. In our case, we've built an indexing engine that has some knowledge about the installation reports, so if it sees an installation that failed due to a problem in IIS, it is able to discard the SQL logs when indexing that report, thus allowing us to keep our index with uncompressed data so we can search it faster. An added bonus of this is the cutting down of data to a reasonable size, by getting rid of superfluous information.
Providing a search interface is a trivial task, as long as your indexing did the right job.
Besides being delicious, BACON is our Better Automated Categorizer Of New reports!
We used Python for crawling and indexing because it's a very good language for performing operations with strings, regular expressions and dictionaries. And because we like to use the right tool for the right job.
We used MongoDB for the data-store because our use case was better suited by a NoSQL document-store engine than a standard relational database: we do mostly inserts and reads so we need atomicity but we don't need transactions; we have a lot of data and many searches require full scans; our data fits documents (with logs from the several sub-installers as attributes) better than rigid structure tables and due to the size of our data and search patterns, database joins would be painful. Again, the right tool for the right job (although we could have used other document-store NoSQL engines).
We used the Agile Platform for developing the web application that enables users to search MongoDB through a browser without having to learn mongo's specific query commands, and where search patterns can be stored for automatically grouping similar reports. The web controller also orchestrates the crawler runs. Building web apps with the Agile Platform is easy and quick and integrating them with MongoDB was no trouble. Again, the right tool for the job.
We've been so happy with the performance of the system that we've extended its usage to index not only installation error reports, but all error reports for all Agile Platform components, so BACON has been helping our maintenance team in solving all the problems that are reported to us (many of which you can see by checking the release changelogs).
In closing, it is worth mentioning that the BACON project was fully built under OutSystems' R&D myFriday initiative. If you've heard of Google's "20 percent time", you may already be guessing what the myFriday is all about: we get out of our roadmap for a moment and use that time to experiment with things that may lead to improvements to our product or internal processes. It took about 2 days to bring BACON to life. Originally meant to tackle an annoyance, the system became meaningful and useful for the whole R&D team, ultimately allowing us to better serve our customers.
This comes to show that if you are faced with a daily hurdle, and you decide to take some time to fix it rather than living with it, you may end up coming up with a surprisingly powerful solution in a fairly short time. All the flavor with 0% fat.