Sunday, August 28, 2011

How To Install and Configure Splunk For Log Analysis

Splunk, in simple words, is a log analyzer, a powerful one. It understands machine data like no one else and make it searchable and displays it in easy to understand way with a web dashboard. This data is important not only for searches but also for investigations, monitoring and making decisions regarding your infrastructure. Large organizations like Motorola and Vodafone use splunk to keep an eye on their server and now I am going to show you, how you can do the same.

How to install Splunk?
To install splunk just download the relevant installer from this page and use a suitable package manager. I'll use Centos 5.6 for this post. So just do a "yum install splunk-4.2.3-105575.i386.rpm". The rpm will be installed in /opt/splunk. Chances are that it won't be in your path so either add it or just use the full path. If the splunk is not in the /opt run a find on it "find / -name splunk". To start splunk server just issue the command " /opt/splunk/bin/splunk start" and accept the license. Don't forget to allow the port, usually 8000, from you firewall.

Adding logs for analysis
Let us add apache logs to the splunk for analysis. These logs are located at /var/log/httpd/. Log in using the web console at http://localhost:8000 with credentials id: admin and password: changeme. Click on the "Add data" button and then on Apache logs link. Now you got to "Consume Apache logs on this Splunk server", click next and specify the path to the apache logs directory and you are done.

Searching through the logs
From home page go to the welcome tab and launch the search app. Now you just need to type the search query and the relevant parts from the logs will be presented to you. 

Building Reports
From the search app only, click on the "Build Report" link and just specify the criteria and that is it. You'll get the report in no time.

And there it is. Splunk is good to go. Explore more and comment here. :)


  1. I'm surprised to see a proprietary tool with an astronomical cost being recommended on a Libre OS.

    Really, if you want to help promote Free/Libre software, you should instead look at how to achieve this same thing using rsyslogd.

  2. Point well taken. I will look into rsyslogd for sure. I just posted this one because I had to learn how to use this recently.

    As far as cost is the concern, splunk is free to use till you parse 500 MB of data per day which is enough for small organizations and startups.

    rsyslogd post coming up next :)

  3. >>splunk is free to use till you parse 500 MB

    That isn't the point. In most cases where you need this capability (clustered deployments), the logs are not going to be less than 500MB. For non-clustered deployments, the usefulness of splunk is questionable - there are many Libre log analyses tools. So thats a red herring.

    And, in any case, what would be really useful, would be to look into what Libre components exist, and how they can be brought together to achieve the same functionality that splunk offers.

  4. The data is not 500 MB overall. You can index 500 MB per day.

    I do accept that one should look into libre components and try to integrate them. That is what I am gonna do (and that is what I do most of the times).

  5. >>You can index 500 MB per day

    I know. I've used Splunk. And I've done medium to large clustered projects. Believe me, in the real world, when you have a problem in a clustered project, you have to turn up logging on suspect components to DEBUG or TRACE. And that's when you can very easily exceed 1GB per node per day.