Correlation analysis is appropriate to use to detect whether
two continuous variables share a linear relationship. Correlation itself is measured
by the correlation coefficient (?)
which measures how much one viable changes in relation to the other. This
relationship can be either positive or negative correlation or show no
correlation at all. A second statistical
method, significance analysis, can then be carried out to provide a
significance value e.g. p
value, which can then be used to test the null hypothesis.
The R function used for correlation analyses is cor.test.
This function requires two vectors that must be the same length e.g.
cor.test(x,y). The input, cor.test(x, y)$estimate, is required to produce a
correlation coefficient (?3). If
the correlation coefficient is positive, then this indicates that the two
variables have a positive correlation.
If the correlation coefficient is negative, then it indicates a negative
correlation between the two. The p value can be produced by the input
cor.test(x, y)$p.value. If this p
value is less than a critical p value
(0.05 is commonly used), then the null hypothesis can be rejected as the two
variables have displayed a significant correlation. However, if the p value is greater than a critical p value then there is no significant correlation between the two
variables, therefore the null hypothesis cannot be rejected. To recall the p value and correlation coefficient, I
can use the correlation model ‘model’ to equal cor.test(x,y) and then input
‘model’ into the console when required.
The null hypothesis of correlation analysis (H0),
states that there is no correlation between the production of rice in the two
countries of Bangladesh and Pakistan. Therefore the null hypothesis is H0: ? = 0 , with ?, the
correlation coefficient between the production of rice in the two countries, equalling
0 due to the lack of correlation.
The two sets of data, from the production of rice from
Bangladesh and Pakistan, are the same size, therefore, I can use correlation
analysis to exam if the two values are correlated. I entered the data sets
independently, under the operator c. This then allowed me to include these data
sets in the cor.test function with the output saved as a ‘model’. Subsequently,
I could then generate a scatter diagram, using the function plot, to visualise
the two sets of data. I added a legend
into the plot to show the p value and correlation coefficient by using the function
legend. I used the function “topleft” to place the legend in the desired place.
The p value for
the two variables is 0.00158. As it is less than the critical value of
0.05, the null hypothesis can be rejected. This indicates that the rice
production from Bangladesh and Pakistan has a significant correlation. The
correlation coefficient (?) is
0.6588979, therefore there is a positive correlation beaten these two
variables, as ? is positive. The rice production from Pakistan likely
increases with the increase of rice product from Bangladesh. The figure below
shows the scatter of these two variables. The scatter pattern is consistent
with the correlation analysis test as a positive correlation can be deduced
from the graph (diagonal distribution of points from bottom left corner to top