Correlation analysis differs from regression analysis in that: 1) it determines if a relationship exists between two variables, and 2) if the relationship exists, it identifies it. Regression analysis on the other hand, attempts to predict the value of a dependent variable by using a single or multiple independent variables.
The correlation coefficient tells of the strength of the relationship between two variables (Doanne and Seward, 2007, 490). Additionally, if it is a positive value e.g. 0.1 it tells that the relationship is positive and if it is a negative value e.g. -0.1 it tells that the relationship is negative.
The quick rule for the significance of a correlation at ? = 0.05 is |r| > 2 / v n. The limitation for this rule is that it can only be used for ? = 0.05.
For i = 1…n, the sums needed to calculate a correlation coefficient are: 1) the sum of the product of xiyi, 2) the sum of xi, and 3) the sum of yi.
The two ways of testing the significance of a correlation coefficient are: 1) using the t-test, where the test statistic, r, is rv((1 – r?)/(n – 2)) , and 2) the Z-test, where Z = ln[|(r+1)/r-1)|]/2 .
From the question, Let x = weekly pay, y = income tax and n = 35. From the ANOVA table, the fitted regression equation is y = 30.7963+0.0343x.
From the output table, the degrees of freedom are 33 and the critical value at ? =0.05 from appendix D is 2.035.
Since the p-value = 0.0161 is less than ? = 0.05 we conclude that the correlation coefficient, r, is zero and accept the slope value of 0.0343.
With a 95% level of confidence, the true slope lies inside the closed interval 0.0101 and 0.0584.
Now, t = 2.889 and t? = 8.35 = F. This is the verification.
From the table, since r? = 0.202 then the regression equation, y = 30.7963+0.0343x, explains 20.2 % of the correlation that exists between the two variables.
From the question, X = total assets ($ billions), Y = total revenue ($ billions), and n = 64.
From the regression output, the fitted regression equation is y = 6.5763 + 0.0452x.
The regression output shows that the degrees of freedom are 62. The critical value at ? = 0.05 is 1.999.
Now, from the regression output t = 80813, which is much greater than the critical value of 1.999. Thus, the conclusion is that, the true slope ? 0.
With a 95% level of confidence, the true slope lies inside the closed interval 0.0342 and 0.0563.
Now, t=8.183. t?= 66.96 ? F. Thus F = t? for the slope.
X1 as the predicting value accounts for more than half of the variability of y. Additionally, the true slope of the line is not zero. From these two facts we conclude that X1 is a good predictor of y.
From the stepwise regression analysis and the results we note that: 1) addition of variables causes an increase in r?, which is normal behaviour, and 2) the addition of infant mortality causes r? to decrease in tiny amounts. From these observations, we can conclude that: 1) the variables used in the analysis do not give the model sufficient exploratory power.
A likely reason for such a case to occur is that the variables in the analysis might contain to a significant extent the same information. 2) The addition of variables GDPcap and Literate causes LifeExp and InfMort to be no longer highly significant.
The results of the regression analysis describe a highly predictive regression model. This is because the independent variables in the analysis explain 81.1% of the variability of the dependent variable.
The graph is shown is shown in Appendix A.
America was in war in the 1960’s and 1970’s thus the high number of aviation shipments during these periods is a result of increased creation of war planes
A fitted trend would not be helpful for the above data as no trend reveals itself whether linear or not.
The graph is shown in appendix B. Again, a fitted trend would not be helpful for the above data as no trend reveals itself whether linear or not.
The best trend model for the data is shown in Appendix C where the regression equation is, y = 182.21x – 362294 and r? = 0.7257. The forecast for 2004 is given by substituting x with 2004 in the regression equation. Thus, the forecast is 182.21 ?2004 – 362294 = 2854.84. The reason why it is good to ignore the earlier years is so that we get a sub-dataset that can be properly analyzed using an appropriate regression model.
Quiz question one.
The table in Appendix D gives the respective starting salary averages (in thousands of dollars) for each gender in each of the MBA majors. Additionally, it also gives the starting salary averages (in thousands of dollars) for each of the MBA majors.
From the averages in this table, the dean can conclude that: 1) males are paid more than females in each of the MBA majors except marketing where the opposite is true. 2) Finance is the highest paying MBA major followed by Accounting followed by Marketing and Management is the least paying MBA major. 3) Both males and females should major in Finance if they wish to maximize their starting salaries.
Quiz question two.
Let p1 be the proportion of male doctors who took 325 mg asprin tablet. Thus, p1 = 104/11307 = 0.00942. Let p2 be the proportion of male doctors who took placebo. Thus, p2 = 189/11304 = 0.01713.
The null hypothesis is: H0 = p1 – p2 ? 0, which when interpreted means that the proportion having heart attacks is not significantly lower for male doctors who took 325 mg aspirin tablet than for male doctors who took placebo. The alternative hypothesis is : H1 = P1 – P2
The test for these hypotheses is two-tailed. Let z be the test statistic, thus z = z = (p1 – p2) – D0 / ?p1-p2 where ?p1-p2 = v p (1 – p)(1/n1 + 1/n2). From the question and the hypotheses set, n1 = 11307, n2 = 11304, D0 = 0, p = 293/22611 = 0.013. Therefore z = ((0.00942 – 0.01713) – 0) / v (0.013 (1 – 0.013)(1/11307 + 1/11304)) = -5.00. The p-value for z is 0.000000285.
Since the p-value is less than 0.001 we have extremely strong evidence that H0 if false and thus we accept H1 and conclude that, the proportion having heart attacks is significantly lower for male doctors who took 325 mg aspirin tablet than for male doctors who took placebo.
Doanne, D. P. and Seward, L. E. (2007). Applied statistics in business and economics. (1st ed). McGraw-Hill/ Irwin: New York. 490
Figure 1. Graph for data on U.S. Manufactured General Aviation Shipments, 1966–2003
Figure 2. Graph for data on U.S. Manufactured General Aviation Shipments, 1992–2003
Figure 3. Graph showing fitted trend model for data on U.S. Manufactured General Aviation Shipments, 1992–2003
Table 1. Averages for each gender and MBA major