Software Reliability

Software Reliability is a field in software engineering rather than the sentiments about a software. We often see commercials claiming high reliability for a product either hardware or software. When the product is a hardware that may sound good, because systems are mostly mechanic or physical with electronic circuits.

However, when it is a software we need to ask very important questions. Engineering is not a social science. When someone says “reliable”, an engineer should ask how reliable. And he must be expecting some numbers to explain the probability of a failure within a certain time period.

Well, software reliability suggests some models to answer these questions. Let’s have a look the current practices and some handy formulas which may be useful for practitioners.

Some definitions and formulas of Conventional Theory about reliability:
The following definitions and formulas are good for a basic understanding. However, the practitioners should refer to the reliability models I provide in the following section.

  • Reliability is the probability of functioning correctly for a given period of time.
    r1

    N is the number of identical components and n(t) is the number components functioning correctly.

  • Unreliability is the probability of not functioning correctly for a given period of time.
    r2
    N is the number of identical components and nf(t) is the number components functioning incorrectly.
  • Failure rate of a device or a system is the number of failures within a given period of time.
    r3
  • MTTF (Mean time to failure) is the expected time that a system will operate before the
    first failure occurs.
    r4
    Any given system will have only a 37% chance of functioning correctly for an amount of time equal to the MTTF and naturally 63% chance of not functioning correctly for the same period.
  • MTBF (Mean Time Between Failures) will be the total of MTTF and MTTR (Mean Time To Repair):
    MTBF = MTTF + MTTR
  • Current State of the Practice of Software Reliability

    Having a mature software reliability requires the experts to deal with an immature area: software testing. One of the crucial concepts is automated testing. Using automated testing 1 million test cases may be generated and applied to the software.

    Moreover, the test cases rely on sound statistical principles which make them fresh test suites.

    It is also obvious that there is an effort to standardize the metrics used for software reliability. This is a crucial improvement considering that Reliability Models must be evaluated using the same reliability metrics.
    As the complexity of the software platforms increases it is not always possible to test the software and determine reliability parameters on all platforms. In this case automated system reporting has become a useful practice.

    The software development for ultra-reliable systems are leading the improvements in software reliability as
    expected. Because these developers are under a great pressure of creating failure-free systems considering that one failure usually yields catastrophic results in such industries.

    Moreover, having a look at the methods adopted by ultra-reliable system designers we can perceive the deterministic measures to deploy the software.

    What are the models used for evaluating the reliability of a software?

    Basically, these models are based on the data gathered while testing the software system. In AGILE approach, there is no distinct testing phase as in Waterfall/Spiral . However, it is still possible to generate some useful data about bugs found during the different phases of AGILE methodology thanks to the systematic software tools used for monitoring the progress.

    SR (Software Reliability) Models are classified in two groups:

  • Reliability Growth Model: Reliability Growth model as the name suggests includes repeated testing, failure and correction. They are related to the execution of reliability procedures during debugging.
  • Reliability Model: We can’t always trust a reliability model when the testing shows zero failures for enough samples and the model predicts an acceptable MTTF (Mean Time To Failure). With this motivation these models are related to the period after debugging phase.
  • Reliability Growth Models are found to be promising and preferred by the majority of software reliability practitioners. The most popular SRGM models:

    Basic Execution Time (BET) Model:

    Dr. Musa indicated the importance of using execution time rather than raw time for the testing/debugging period. This model also includes refinements to the previous NHPP Model.
    bet

    I have implemented the following functions to calculate the failure rate or the time needed to achieve a desired failure rate.

    BET failure rate:

    Calculating the total testing time needed to achieve a failure rate:

    Logarithmic Poisson Model:
    This model is an NHPP with an exponentially decreasing failure function where “theta” is the failure intensity decay parameter.
    pois

    Similarly, the following functions will be useful for calculating necessary values.

    Calculating failure rate:

    And calculating the needed execution time:

    Following is the implementation for calculating failure rate from number of faults repaired.

    How can we find the parameters used in the formulas?

    Well, we have some unknown parameters such as theta, beta, nu, etc. We do not know these for sure. Each software development/testing team will have different parameters. Even an experienced programmer may impact these parameters greatly. The engineer/manager responsible for reliability should collect some data from previous projects and use a regression model to estimate these models.

    A transformation to make life better:

    BET Model beforeTrans can be transformed into:
    transformed
    Now collect the data as log of failure rate (in terms of number of bugs) and time, Calculate intercept and beta1. Your reliability growth model (BET) is ready. You will be able draw graphs like this:
    p1

    p5