|Year : 2018 | Volume
| Issue : 1 | Page : 62-70
Statistical analysis in nursing research
Grace Rebekah1, Vinitha Ravindran2
1 Lecturer, Biostatistics, CMC, Vellore, India
2 Professor, College of Nursing, CMC, Vellore, India
|Date of Web Publication||11-Jun-2020|
Source of Support: None, Conflict of Interest: None
The word statistics and the process of statistical analysis induce anxiety and fear in many researchers especially the students. Difficult and different terminologies, complex calculations and expectations of choosing the right statistics are often daunting. However, it is well recognized that statistics play a key role in health and human related research. As it is not possible to study every human being, a representative group of the population is selected in research studies involving humans. Statistical analysis assists in arriving at right conclusions which then promotes generalization or application of findings to the whole population of interest in the study. This article attempts to articulate some basic steps and processes involved in statistical analysis.
Keywords: statistics, key role, population, analysis
|How to cite this article:|
Rebekah G, Ravindran V. Statistical analysis in nursing research. Indian J Cont Nsg Edn 2018;19:62-70
| Introduction|| |
Researchers attempt to answer specific questions on human behaviour and response by collecting pertinent data. In a quantitative research design the data are collected from a representative sample of the population and from the gathered data conclusions are drawn for the population. The group of individuals who represent the population and are studied is called ‘sample’ and the term ‘statistic’ is used to describe the characteristics of this group (Munro, 2005). An individual who is part of the sample in the study is called ‘subject’. Researchers use various methods such as rating scales, observations and questionnaires to collect information that are relevant to their question. The information thus collected is termed data. The decision about data collection methods should be based on the ethical guidelines, cost, time constraints, population appropriateness as well as availability of research assistants to collect data (Sadan, 2017). When data collection is complete a large amount of data are available in many pieces and sections. Statistics involves extracting meaning from seemingly incomprehensible data. Statistics as a discipline is defined as “a method of collecting, organizing, analysing and interpreting the numerical data” (Antonisamy, Christopher, Samuel, 2010). Large sets of data can be complex, and understanding what the data means requires advanced analytical tools. Statistics is a set of tools than can inform experts dealing with complex information.
What is Statistical Analysis and why is it needed?
Statistical analysis is the science of organizing, exploring, summarizing and presenting large amounts of data to discover underlying patterns and trends (Daniel & Cross, 2013). Data is best represented by analysing it using appropriate and valid statistical test so that the truth of the data is revealed. Such procedure which sheds light on the hidden truth is known as statistical analysis. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used (Ali & Bhaskar, 2016). There are three purposes for statistical analysis: 1. To describe and summarize information, 2. To make predictions about occurrences and 3. To identify associations, relationships or differences between variables that are observed or measured (Munro, 2005).
| Types of Statistics|| |
There are two broad types of statistics. The first is the ‘descriptive’ statistics which is used to describe and summarize data into more understandable terms without distorting the information (Munro, 2005). It usually is represented as percentage or average which is understood easily. The second type of statistics is called ‘inferential’ in which specific statistical techniques are used to make conclusions about the whole population using a representative sample (Antonisamy et al., 2010). Inferences are made by finding associations or relationships with in the variables under study.
| Variables and Types of Data|| |
In quantitative research specific characteristics of a subject is measured or observed which constitute the data. Each characteristic which is measured varies between subjects in the study and hence is called a ‘variable.' The nature of measurement for each variable is also different (Antonisamy et al., 2010). For example the body weight varies between subjects and is measured in kilograms. In contrast the blood type of person which also varies is not expressed in numerical terms but is articulated as group. When the measurement is expressed as numbers it is called numerical or quantitative data and when it is expressed in categories or observed as individuals belonging to specific groups (male/female) it is called categorical or qualitative data (Polit & Beck, 2008). Understanding the type of data assists in selecting the type and techniques of statistical analysis.
| Statistical Analysis Software Packages|| |
Performing statistical analysis was very challenging and time consuming in earlier days and needed dedicated time and accuracy in calculations as data were written, tabulated and analysed manually. However, advancement in techniques and technology have made organizing information easier and arriving at results faster. There are many new softwares which are being invented every single day. The most common and widely used statistical software are “Statistical Analysis Software (SAS), STATA, Statistical Package for Social Sciences (SPSS) etc, and there are also free source software such as ‘R’ which could be downloaded from internet. The challenge is to prepare appropriate codes for data to run appropriate analysis. Other software’s such as EpiInfo, PASW (Predictive Analytics Software), Excel and Access are also available. Among the above there are a few, such as EpiData which are used for cleaning, sorting and organizing data rather than analysing data. Once data are organized it is transferred to appropriate statistical software packages for analysis.
| Data Preparation and Cleaning|| |
The most essential part of data analysis is preparing the data for analysis by coding them and cleaning the data. Coding the data would save time and would help in avoiding unnecessary data entry errors (Pallant, 2011). Coding involves giving short terms for each variable and assigning numbers to each possible responses within all the variables studied. For example the variable day time sleep disturbance can be named as SD Day and responses ‘always’ as 2, ‘sometimes’ as 1 and ‘never’ as 0. Numerical representation of data is necessary for analysis. There are a few softwares which help in efficient and error free data entry by creating checks and entering the data. One that is a widely used now is called EpiData which helps us to do the data entry by creating ‘checks’ and ‘must enter’ column etc which prevents missing information (Epidata Software Association, 2018). Once the data are entered and organized they can be imported into SPSS, STATA or any other software package to complete the analysis.
| Descriptive Statistics|| |
People differ with respect to physiological, biochemical and other parameters, most importantly the response to treatment varies between individuals. It is very important therefore to identify the variability and report it using the help of the descriptive statistics such as (i) Frequency, (ii) Central tendency and (iii) Relationships ( Altman, 1990; Altman & Bland, 2005)
• Frequency, spread and distribution : Frequency for categorical data will be reported using tally marks and reported in frequency and percentage. Distribution of continuous variables is reported using histogram or stem and leaf plot.
Example: [Table 1] showing frequency distribution and percentage
|Table 1: Demographic and Clinical Variables of Children after Selective Urologicalsurgeries|
Click here to view
Example: Bar diagram showing distribution of subjects according to knowledge score in an experimental study
The simplest measure of central tendency is known as Average or Mean. It is the sum of all the observations divided by the number of observations and it is denoted as follows :
Where ς denotes the addition of a set of values.
X - denotes the variable usually to represent the individual data values.
n - represents the number of values in a sample (Daniel & Cross, 2013; Rosner, 2000; Triola & Triola, 2006)
The average amount of lead in air is to be monitored as it is a known fact that Lead causes serious adverse events in human beings if not monitored. Following is the data on lead in the environment in cubic meter. A standard amount of lead is given as μg/m2. The data presented is from the Building 5 of the World Trade Center site which was monitored on different days soon after the terrorist attack on 11th September 2001.
On an average the level of lead was found to be 1.538μg/m3, There was a higher value of lead at Day 1 which could have been because of vehicle emission which had rushed to the site, such values can be considered as extreme value or outlier (Triola & Triola, 2006).
Median is an alternative measure of central tendency most used for skewed data. The observations are arranged in ascending order and the median value shall be chosen from the midpoint. The main disadvantage of mean is that it gets affected by extreme observations. Median remains unaffected in such situation and therefore can be used when the data is too varied or is leaning more towards onside of the distribution instead of falling within a normal curve. From the above data, lead in air can be reported as follows (Daniel & Cross, 2013; Rosner, 2000; Triola & Triola, 2006)
0.42 0.48 0.73 1.10 1.10 5.40
Median = (0.73+1.10)/2 = 1.83/2 = 0.915 μg/m3 (Source: Triola & Triola, 2006)
Mode is also used more widely. It is the most frequently occurring value among the given observations in the data set. In some situations a researcher may come across two modes, then it is known as bimodal. More than two frequencies become multimodal. When there are no repeated values then there is no mode.
The mode for the Lead in Air data will be 1.10μg/m3 (Triola & Triola, 2006)
| Inferential Statistics|| |
Inferential statistics helps to do a comparison between the groups based on the research question by formulating a hypothesis. A hypothesis is defined as the statement about the research carried out (Ali & Bhaskar, 2016; Altman & Bland, 1996; Altman & Bland, 2005). It is a proposed explanation of the relationship between variables on the basis of an assumption. For example we can hypothesise that there is an increased incidence of musculo skeletal injuries among health care professionals. Statistical tests are carried out on the collected data to accept or reject this assumed relationship.
Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true (Munro , 2005). It can be summarized in four steps (see [Figure 2]). First step is initiated by stating the null hypothesis and alternative hypothesis. Second step is to set the criteria for a decision in which the level of significance that is wanted by the researcher is decided. The third step involves computing the test statistics and the final step is making a decision.
|Figure 1: Distribution of subjects according to their knowledge on antenatal assessment before and after video assisted teaching (Note: Adapted from “Knowledge and practice of nursing personnel on antenatal fetal assessment before and after video assisted teaching”a;, by M. Jenifer, A. Sony, D. Singh, J. Lionel, V. Jayaseelan, 2017, Indian Journal of Continuing Nursing Education, 18(2), 87-91)|
Click here to view
|Figure 2: Flow chart of steps involved in Hypothesis Testing (Source: Baber, 2012)|
Click here to view
1. Null and Alternative Hypothesis
A null hypothesis (H0), also referred to as statistical hypothesis, is used for statistical testing and for interpreting statistical outcomes. The null hypothesis suggests an absence of relationship, association or difference in the variables and may be provided when there is lack of theoretical and empirical research to state a research hypothesis (Burns & Grove, 2011). For example the null hypothesis can be stated as follows: There is no difference between the proportion of cases presenting with musculo-skeletal disorders among health care professionals and other professionals. Null hypothesis when rejected or proved wrong, the alternative hypothesis becomes true.
Alternative hypothesis is otherwise known as research hypothesis. It is denoted as H!. Research hypothesis is scientifically postulated and states that there is a relationship between the variables (Burns & Grove, 2011). Research hypothesis can be non-directional or directional. An example of directional research hypothesis would be ‘the proportion of cases with musculo-skeletal disorders are higher among health professionals than other professionals.'
2. Criteria (Levels of significance)
Levels of Significance refers to the criteria ofjudgement upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true (Daniel & Cross, 2013).
3. Test statistics
Hypothesis testing can be broadly classified into Parametric and Non parametric comparison. The Parametric comparisons are based on comparing the parameters of the sample and the population in different aspects based on the data requirement. It demands certain assumptions such as normal distribution of the observed data which follows a bell shaped curve. Assumption of equal variance is also a criteria for parametric test. Thus the parametric analysis relies on the data being normally distributed and almost equally varying so that the estimation of the parameter can be done (Pallant, 2011). Examples of tests which involve the parametric analysis by comparing the means for a single sample or groups are i) One sample t test ii) Unpaired t test/ Two Independent sample t test and iii) Paired ‘t’ test. When the conditions for the parametric tests are not met then non- parametric tests are carried out in place of the parametric tests. Examples include Wilcoxon signed-rank test in place of paired ‘t’ test and Mann-Whitney U test instead of ‘t’ test for independent samples. Each statistical technique/test also has certain assumptions which need to be carefully looked at before the hypothesis is tested. For example t-test is performed on the assumption that the scores are obtained using random sample from the population, independent observations are made on the subjects (one subjects’ performance should not influence the others) and variability in measurements will be almost same (Pallant, 2011).
Types of errors
Although statistical tests assist us in making decisions to accept or reject the null hypothesis, conclusions have to be carefully made to ensure that there are no errors before the findings can be generalized to the population. Essentially two main types of errors have to be considered to numerically evaluate the uncertainty of conclusions from the study (Antonisamy et al., 2010). The two main types of errors are named as Type I and Type II error (see [Figure 1]). When the null hypothesis is actually true and if it is rejected then the error committed is termed as Type I error. Similarly when the null hypothesis is actually false and if we fail to reject it is Type II error. Thus the probabilities of committing type I and type II errors are denoted as α and β respectively A decision had to be made to retain or reject the null hypothesis as we are observing only a sample not the entire population. There is a possibility that the decision made is wrong. There are four decision alternatives regarding the truth and falsity of decision which was made about the null hypothesis (Daniel & Cross, 2013; Triola & Triola, 2005; Rosner, 2000). The four main decisions are as follows and are depicted in [Figure 4].
|Figure 2: Types of error (Source: Hypothesis testing and types of Errors, n.d)|
Click here to view
|Figure 3: Correlation between perceived QOL and parental coping (Note: Adapted from “Perceived quality of life and coping in parents of children with chronic kidney disease”, by E. Kanthi, M.A. Johnson, & I. Agarwal, 2017, Indian Journal of Continuing Nursing Education, 18(1). 27-34)|
Click here to view
- The decision to retain the null hypothesis could be correct.
- The decision to retain the null hypothesis could be incorrect.
- The decision to reject the null hypothesis could be correct.
- The decision to reject the null hypothesis could be incorrect
P value is defined as the probability of getting a result as extreme as or more extreme than the one observed when the null hypothesis is true. When our study results in a probability of 0.05, we say that the likelihood of getting the difference we found by chance would be 5 in a 100 times. It is unlikely that our results occurred by chance and the difference we found in the sample probably exists in the populations from which it was drawn. P value <0.05 is considered statistically significant and P value >0.05 implies that there is not statistically significant difference. P value < 0.001 is considered highly significant.
| Choosing the Right Statistics|| |
One of the challenging parts of research for most researchers, especially students, is choosing the right statistical method for obtaining the best and the correct findings (Pallant, 2011). Some statistical techniques, not all, are discussed here as explained by Pallant (2011). [Table 1] explains the characteristics of statistical tests with examples.
In nursing often the researchers are interested to test and see whether there was a significant difference among groups when an intervention is tried.
t- tests are performed when there are two sets of data from the same group or sets of data from two different groups. There are two types of t tests. ‘Paired sample t test’ is done when data is collected from the same group of subjects at two different time points (Before (time 1 ) and after (time 2) giving a structured education programme). The researcher’s question in this instance is whether there is difference in the measurement between time 1 and time 2. The data are related because the group of subjects are the same at both measurements. ‘Independent sample t test’, as the name suggests, is performed when data are from two different (independent) groups of subjects (control & experimental) and the researcher is interested in testing the difference in the measurements/scores between the two groups.
One-Way Analysis ofVariance (ANOVA)
One way analysis is the same as t- tests but is used when there are more than two measurement times for one group or when there are more than two groups. ‘Repeated measure ANOVA’ is performed when the same group of subjects are observed on more than two time points [Pretest (time 1), post test 1 (time 2), post test 2 (time 3)]. '
Between group ANOVA’ is used when the mean scores are compared to look for differences between more than two groups (doctors, nurses, allied health professionals). One way ANOVA will explain whether the groups are different but may not be able to say where the significant difference is (doctors/nurses or nurses/allied health professionals). Further tests have to be performed to find out this difference.
Two Way Analysis of Variance
Two-way ANOVA allows the researcher to test the effect of two independent variables on one dependent variable. Two-way ANOVA also can be performed as between group and repeated measure ANOVA. The advantage of two-way ANOVA is the ability to test the interaction effect and also the main effect of each variable on the dependent variable.
Statistical techniques used to explore relationships. Most often nursing researchers are also interested in checking relationships between variables than differences between groups.
Pearson correlation or Spearman correlation is used when the focus is to check the strength of the relationship between two variables. The test reveals the direction (positive or negative) and the strength (strong or weak) of the relationship. A positive correlation indicates that when measurement of one variable increases the other also increases and a negative correlation indicates that when measurement of one variable increases the other decreases.
Scatter plot showing correlation between perceived QOL and parental coping (see [Figure 3]).
Multiple regression is performed when the researcher does not want to stop with checking the relationship but also find out which independent variable can better predict the occurrence of a dependent variable. Different types of regression analysis are performed to analyse the predictive ability of one independent variable or a group of variables on a dependent variables.
While correlation and multiple regression are parametric tests a common non parametric test used to assess relationship is chi-square test.
Chi- square test
Chi-square test is used extensively in nursing research to determine the relationship between two categorical variables. It compares the frequency of occurrences in the categories of one variable against the categories of another variable. The finding suggests whether there is a significant association between the categories of the variables. The frequency of occurrences in each of the table should be 5 or greater (minimum expected cell frequency) to compute the association.
| Conclusion|| |
Statistical analysis is a backbone of the research and unless the data is correctly entered and analysed with appropriate statistics the true essence behind the research findings will go unnoticed. Every parametric analysis has an alternative non parametric analysis. It is essential to check for assumptions and use appropriate statistic and arrive at right conclusions to enhance generalization of results.
Conflicts of Interest: The authors have declared no conflicts of interest.
| References|| |
Ali, Z., & Bhaskar, S. B. (2016). Basic statistical tools in research and data analysis. Indian Journal of Anaesthesia
, 60(9), 662669. //doi.org/10.4103/0019- 5049.190623
Altman, D. G. (1990). Practical Statistics for Medical Research. USA: CRC Press.
Altman, D. G., & Bland, J. M. (1996). Statistics notes: Presentation of numerical data. The BMJ
, 312(7030), 572.
Altman, D. G., & Bland, J. M. (2005). Standard deviations and standard errors. BMJ, 331
(7521), 903. doi.org/10.1136/bmj.331.7521.903
Antonisamy, B., Christopher, S., & Samuel, P. P. (2010). B io statistics : Principles and Practice
. Gurgoan: Tata McGraw Hill Education.
Burns, N., & Grove, S. K. (2010). Understanding Nursing Research-eBook: Building an Evidence-Based Practice
. Philadelphia: Saunders.
Daniel, W. W., & Cross, C. L. (2013). Biostatistics: A Foundation for Analysis in the Health Sciences (10 edition). Hoboken, NJ: Wiley.
Driscoll, P., & Lecky, F. (2001). An introduction to hypothesis testing: Parametric comparison of two groups 1. Emergency Medicine Journal
, 18(2), 124-130.
Jenifer, M., Sony, A., Singh, D., Lionel, J., Jayaseelan, V. (2017). Knowledge and practice of nursing personnel on antenatal fetal assessment before and after video assisted teaching. Indian Journal of Continuing Nursing Education, 18(2)
Kanthi, E., Johnson, M.A., & Agarwal, I. (2017). Perceived quality of life and coping in parents of children with chronic kidney disease . Indian Journal of Continuing Nursing Education
. 18(1). 27-34
Munro, B. H. (2005). Statistical Methods for Health Care Research
. Philadephia: Lippincott Williams & Wilkins.
Pallant, J. (2011). SPSS survival manual: A step by step guide to data analysis using the SPSS program
. Philadelphia: Open University Press.
Polit, D. F., & Beck, C. T. (2008). Nursing research: Generating and assessing evidencefor nursing practice
. Philadelphia : Lippincott Williams & Wilkins
Priyadarsini, I. S., Manoharan, M., Mathai, J., & Antonisamy, B. (2017). Psychosocial Behaviour in children after selective urological surgeries. Indian Journal of Continuing Nursing Education, 18(1)
Sadan, V. (2017). Data Collection Methods in Quantitative Research. Indian Journal of Continuing Nursing Education. 18
Triola, M. M., & Triola, M. F. (2006). Biostatistics for the biological and health sciences
. Boston: Pearson Publishers.
[Figure 1], [Figure 2], [Figure 3], [Figure 4]
[Table 1], [Table 2], [Table 3]