When working in a classic IT infrastructure you often face the problem that developers only have access to test or development environments, but not to production. In order to fix bugs or to have a glance at the system running in production, log file access is needed. This is often not possible due to security requirements. The result of this situation is that the operation guys need to provide these files to the developers, which can take a certain amount of time.
A solution to these problems is to provide a Log Management Server and grant access to the developers via a UI. Despite some commercial tools like Splunk, which is the de-facto market leader in this area, there are some quite promising open source solutions which do scale very well and may provide enough features to get the job done.
The advantage of using open source technology is that you can – but do not have to – buy subscriptions. Furthermore, software like Splunk and Log Analysis have pricing plans, which depend on the amount of logs you ship daily. The problem is that you have to pay more if the volume of logs increases either due to a raised log level to help analyze some bugs in production or simply as more services are deployed.
Last but not least, there are of course cloud solutions like Loggly. You can basically ship your log events to a cloud service, which then takes care of the rest. You do not have to provide any infrastructure yourself. This is a very good solution unless the security policy of your organization prohibits shipping data to the cloud.
Of course this overview is incomplete. I just picked some tools for a brief introduction. If you think something is missing, feel free to blog or comment about it.
At the moment, the probably most famous open source log management solution is the ELK-Stack. It is called a stack because it is not one software package but a combination of well-known open source tools. The components are:
Despite all the good things about the ELK-Stack there are some drawbacks, which would make it not the optimal choice under some circumstances.
Kibana has no user management. If you want user management you have to purchase commercial support from Elastic to get a license for Shield.
Next, there is no housekeeping for the Elasticsearch database. Logstash creates an index for each day. You have to remove it manually if you do not need it anymore.
Graylog is an alternative log management platform that addresses the drawbacks of the ELK stack and is quite mature. It provides an UI and a server part. Moreover, Graylog uses Elasticsearch as database for the log messages as well as MongoDB for application data.
The UI does basically what a UI does. It makes the data accessible in a web browser.
The server part provides a consistent management of the log files. The Graylog server has the following features:
Moreover, Graylog can easily be deployed in a clustered environment, so that you get high availability and load distribution.
In order to create a full solution it is suitable to combine Graylog with Logstash with a little patching of Logstash and a custom Graylog Plugin.
As a standard for log events, Graylog promotes usage of the Graylog Extended Log Format (GELF). This is basically a JSON format containing the following information:
A GELF message can contain many other optional fields as well as user-defined fields. The timestamp is really important to see the log messages ordered by log message creation time and not at the time when entering the system.
Unfortunately it’s a little bit challenging to make Logstash talk to Graylog and vice versa. The main problem is that Graylog wants the end of a message with a NULL delimiter whereas Logstash creates \n. Logstash also expects \n when receiving log messages as well as Graylog sends log messages with the NULL delimiter.
1. Use a message broker like RabbitMQ. Logstash can write to RabbitMQ, Graylog can read. This solution decouples both applications, so that the Graylog server can be shut down while Logstash is still producing log messages.
2. Use the HTTP input in Graylog to receive messages from Logstash. This solution has some drawbacks. The biggest might be that if Graylog is down, Logstash discards the message after a failed send attempt.
3. Use the GELF TCP input and patch Logstash. Unfortunately, there is no possibility to change the line separator in the Logstash “json_lines” codec. This could be done in a patch which is currently open as a pull request. Hopefully, it will be merged soon. The big advantage in using the Logstash TCP output is that Logstash queues messages which cannot be send and retries sending them.
Sending messages from Graylog to Logstash might not make sense in the first place. But if you think of creating a file-based archive of log files on a NAS or in AWS S3 it might make sense though.
As mentioned above, even there is a problem with the line ending. Fortunately, Graylog provides a plugin API. So I created a plugin which can forward log messages to a Logstash instance. This instance can write the log files then.
The plugin is hosted on Github and licensed under the APL 2.0.
As described in the article, you can combine Logstash and Graylog with little effort in order to build an enterprise-ready flexible, scalable and access controlled log management system. Graylog and Elasticsearch as central components are able to scale out the described setup and can handle a huge load of data.
Graylog, Logstash and Elasticsearch are all three high-quality open source tools with a great community and many users. All these products are also commercially supported by companies behind them.
Finally there is one important note for all the Kibana lovers. Of course it is possible to also deploy Kibana in parallel to Graylog. Then you can build nice dashboards with Kibana and have the features like User Management and Elasticsearch Housekeeping in Graylog.