We Crunched 1 billion Java Logged Errors – Here is the result of 97% of Them

It is 2016 and nothing has changed in 30 years Dev and Ops teams still rely on log files that request disposal issue. Due to the unknown and we have complete confidence in the log files because we think the true secret to themselves. If only grep hard enough, or write a complete regex question, the answer will present itself Magic sight.

Yes, appliances like Splunk, Elk and Sumologic so quickly search tools wood but all suffer from something – rative of the noise. Operational noise is a silent killer IT and your business today. That is why the appeal issues go undetected and days to repair.We Crunched 1 billion homes Java Logged Errors – Here is the result of 97% of Them

Log Reality

Here’s a dose of reality, you will only log what you think will break an application, and you’re constrained by how much you can log without incurring unnecessary overhead on your application. This is why debugging through logging doesn’t work in production and why most application issues go undetected.

Let’s assume you do manage to find all the relevant log events, that’s not the end of the story. The data you need isn’t usually not in there, and leaves you adding additional logging statements, creating a new build, testing, deploying and hoping the error happens again. Ouch.

We Crunched 1 billion homes Java Logged Errors - Here is the result of 97% of Them,Naskraft TechBlog
We Crunched 1 billion homes Java Logged Errors – Here is the result of 97% of Them

 

Some Time Analysis

We capture and analyze every error or exception that is thrown through Java applications in agriculture. Using some cheeky date science is what I have seen and reviewed over 1,000 applications monitored.

High-level quantity of all involved. It found:

Avg. Java application to throw faults 9.2 million / month
Avg. Java application to provide approximately 2.7TB of storage / month
Avg. Java application contains 53 errors different / month

Top 10 Java errors through the Frequency
1. NullPointerException
2. NumberFormatException
3. IllegalArgumentException
4. RuntimeException
5. IllegalStateException
6. NoSuchMethodException
7. ClassCastException
8. exception
9. ParseException
10. InvocationTargetException

So here, the pesky NullPointerException cause thats all fallen log files. Incidentally, the monitoring was null the first bit I attended my first statutory review back in 2004 when I was a Java developer.

Top Java errors through the sequence were:

1. NumberFormatException
2. NoSuchMethodException
3. Custom outside
4. StringIndexOutOfBoundsException
5. IndexOutOfBoundsException
6. IllegalArgumentException
7. IllegalStateException
8. RuntimeException
9. Custom outside
10.Custom outside

Time of Crisis (shooting):

So, you work with development or operation and trained the request disposal of the above application which provides one million mistakes a day, what do you do? Well, let’s see if bleak day application was just?
Let us pick up, say, in 15 minutes. However, that’s still 10,416 errors while looking for their 15 minutes. You now see this problem of the so-called rative noise? No wonder people are struggling to see and disposal applications today … and it’s not going to get easier.

What if we Just Fixed 10 Errors?

Now, let’s say we fixed 10 errors in the above application. What percent reduction do you think these 10 errors would have on the error count, storage and operational noise that this application generates every month?

1%, 5%, 10%, 25%, 50%?

How about 97.3%. Yes, you read that. Fixing just 10 errors in this application would reduce the error count, storage and operational noise by 97.3%.

The top 10 errors in this application by frequency are responsible for 29,170,210 errors out of the total 29,965,285 errors thrown over the past 30 days.

Take the Crap Out of Your App

The vast majority of application log files contain duplicated crap which you’re paying to manage every single day in your IT environment.

You pay for:

  • Disk storage to host log files on servers
  • Log management software licenses to parse, transmit, index and store this data over your network
  • Servers to run your log management software
  • Humans to analyze and manage this operational noise

The easiest way to solve operational noise is to fix application errors versus ignore them. Not only will this dramatically improve the operational insight of your teams, you’ll help them detect more issues and troubleshoot much faster because they’ll actually see the things that hurt your applications and business.

slack

Final Thoughts

We see time and time again that the top few logged errors in production are pulling away most of the time and logging resources. The damage these top few events cause, each happening millions of times, is disproportionate to the time and effort it takes to solve them.

 

4 comments

  1. I’m curious to find out what blog platform you’re using?
    I’m experiencing some minor security problems with
    my latest site and I’d like to find something more safe.
    Do you have any solutions?

    Reply

  2. I am so happy to read this. This is the type of manual that needs to be given and not the accidental misinformation that’s at the other blogs.
    Appreciate your sharing this best doc.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *