Question 1: What are the assumptions for one way ANOVA? Why are they important for assumption checking? How do you do the assumption checking? (2 references).
The analysis of variance or ANOVA was developed by R. A. Fisher and is a test of significance between or among means (Keppel & Wickens, 2004). The purpose of which is to compare two or more means and determine the difference between them. The one way ANOVA is used when the data to be tested contains only one independent variable. For example, one would want to test whether there is a significant difference between the math scores of college freshmen in the SAT, the independent variable would be the math scores, while college freshmen could be grouped according to gender, course or even IQ. The ANOVA is a powerful test which compares and determines the significance of the difference between groups. However, the reliability and effectiveness of the one-way ANOVA to actually determine the true mean between groups depends on whether the kind of data used satisfies all the assumptions of the one way ANOVA.
The data that is subjected to ANOVA must be scrutinized to ensure that the following assumptions are met: independence, measurement scale, normality and homogeneity (Kirk,1995). A one way ANOVA can be carried out confidently if the observations or the mean scores derived from the subjects are independent, hence one way ANOVA can only be used if three or more groups in the study are independent from each other. For example, we want to know the difference in math scores of low, average and high IQ students, since the group of students are distinct from each other, then they are independent. Test data should also be a product of interval scale which can be easily determined by the kind of data gathering measure used. The assumption of normality rests on the assumption of independence, using a large number of respondents would result to scores that are distributed normally as opposed to using small number of respondents. The assumption of homogeneity requires that the groups or observations in the study must have the same number, and whose variance is very low. It is important to check whether the data to be analyzed satisfy the four assumptions because violating any of the assumptions will result to less powerful tests of significance, or that the ANOVA result would be false and misleading (Keppel & Wickens, 2004).
Checking the data as it relates to the assumptions of one way ANOVA can be done using a number of tests and observations about the data. Independence of the groups or observations can be done easily by assessing whether the groups are related or whether they are the same group in different situations. The kind of measurement scale can also be checked by examining the instruments used in the study as to the kind of measurement scale it uses (Kirk, 1995). One way ANOVA can only work with interval and ordinal measurement scales, so that if the observation values are non numerical, then it clearly indicates that the ANOVA cannot be used in this case. Determining normality and homogeneity can be done using a number of statistical measures, however when the sample size is large, it is always assumed that the scores are more likely to be normally distributed and with smaller variance between means. The skewness of the distribution of the data can show whether the data falls under the normal curve. Likewise, determining the presence of outliers and determining whether the groups have the same number of observations would indicate the extent of the homogeneity of the data. Outliers can easily be measured by identifying the highest and lowest scores and testing it against the average or the mean score. Boxplots can also be used to graphically represent the homogeneity of the data, another option is to determine the standard deviation of the scores and examine if it is larger or not, the smaller the variance the more homogeneous the scores are.
Keppel, G. & Wickens, T.D. (2004) Design and Analysis: A Researcher’s Handbook, 4th ed. NJ: Prentice Hall.
Kirk, R. (1995). Experimental design: Procedures for the behavioral sciences, 3rd ed. Pacific Grove, CA: Brooks/Cole.
Question 2: What is the difference between multiple regression and logistic regression? Could you think about an example for each of method? (2 references)
Multiple regression is one of the oldest statistical tests which is used to predict the variance of an interval dependent variable from a series or set of independent variables. Multiple regression can be used to establish how the independent variables influence and account for the variance in a dependent variable in a statistically significant level, it can also determine the predictive validity of the independent variable (Berk, 2003). The assumptions of multiple regression are linear relationships, homoscedasticity, interval data, no presence of outliers, untruncated data range. This would mean that multiple regression can only be used in data that satisfy the above assumptions. The linearity of the relationship of the variables has to be established as it is the most important criteria which determine the effectiveness of the results of the regression. For example, a professional athlete would want to determine which characteristics of an athlete would best predict success, the researcher may gather and quantify measures like health, lifestyle, status and other variables and then test which predicts success in professional sports. One could assume that the relationship of the athlete characteristics to success is linear (follows a straight line). In this way, the recruiter would be able to choose the athlete that has the greatest chance of making it big in the sports industry.
On the other hand, logistic regression can be binomial or multinomial and is used to determine the regression of the data or observation vis a vis the kind of dependent and independent variable. Binomial logistic regression is used when the dependent observation is dichotomous and the independent variable can be any form while the multinomial logistic regression is used when the dependent variable is more than two categories (Pampel, 2000). When a researcher tries to predict the dependent variable based on the independent variables whether it is continuous or categorical is called logistic regression. Thus, logistic regression predicts and determines the probability of an outcome based on the dependent variables. Generally, multiple regression is used to predict the outcome of the dependent variable from a number of different variables, while logistic regression is a much more specific kind of regression used when the independent variable is an attribute.
Multiple regression can be used when a researcher wants to predict the burnout of teachers, he/she thinks that burnout can be affected by a host of factors and determining which significantly leads to burnout can be useful to the school management. Thus the researcher identifies which variables or factors can affect and predict burnout, the researcher required the teachers to identify in a self-report survey where they were asked to determine which factors lead to burnout. This could range from poor working conditions, demotivated students, lack of resources or workload. It is more likely however that teacher burnout was due to lack of resources as measured by multiple regression analysis by testing which factors predict or strongly leads to burnout. An example of when to use logistic regression is when a researcher wants to determine and predict teacher burnout as it is affected by age and gender. Thus the dichotomous variable of gender (male and female) and age (young and old) will be examined to test which of the attributes are more able to predict the behavior of the individuals studied that is the incidence of burnout. The goal is to test whether group membership or sharing a certain attribute or category will also lead to certain conditions as indicated by the measured variables. In this case, logistic regression is used to determine whether being male or female and whether being young or old predisposes teachers to experience burnout.
Berk, R. (2003). Regression analysis: A constructive critique. Thousand Oaks, CA: Sage
Pampel, F. (2000). Logistic regression: A primer. Sage Quantitative Applications in the Social
Sciences Series #132. Thousand Oaks, CA: Sage Publications.