icon

Usetutoringspotscode to get 8% OFF on your first order!

Univariate Data Survey

Univariate Data Survey
Christina Wagner
MBA 810 Introductory Statistics
M&M® in a bag
Concordia University
This experimental study is based on a population of plain milk chocolate m&m’s ® bags of 1.69 ounces each. Assuring for a true random sample of the population 30 samples were acquired in different locations.
Method:
The population is the m&m’s ® and data set comprised of 30 bags. Using a cluster sampling method without replacement the overall sample is counted and found six different colors: blue, orange, yellow, red and brown. Therefore the sample was divided into clusters to calculate the frequencies of each different colors. The data is analyzed from two perspectives: color proportions and number of candies per bag. The numbers of observations are as shown on table 1. For the color proportions, the information used was the total number of color for the total number of candies sampled. For the number of candies per bag, the information used was the data in the number of candies in the bag column. Assuming that each bag was filled by weight on high-speed equipment, and not by count, it is possible to have an unusual color distribution.
Data Value x n Frequency Relative
Frequency
Blue 351 1698 21% 0.20674

Orange 351 1698 21% 0.206714
Green 333 1698 20% 0.196113
Yellow 227 1698 13% 0.133687
Red 233 1698 14% 0.13722
Brown 203 1698 12% 0.119552
As the “central limit theorem” states the distribution of the “sample means” will be normally distributed if a sample size is equal to or more than 30. The sample distribution is shown in Table 1 and the frequency distribution is graphed to determine whether or not the data follows a normal distribution. The relative frequency, hence the fraction of times a color occurred in each sample is shown on Table 1.
When a particular data set is approx. normally distributed, we would find that: the mean would be approx. equal to the median the range will be about 6 times the standard dev. approx. 19 of every 20 observations would fall between +-2 standard deviations. According to the graph the blue candies appear to be normally distributed. A histogram was constructed to test for normality because it is clear to determine whether data is skewed or not using these figures. The color of the sugar coating is a categorical variable. A histogram is used to display the distribution of data.
1 Frequency Cumulative % 1 Frequency Cumulative %
2 1 0.56% 8 21 11.67%
3 4 2.78% 7 19 22.22%
4 10 8.33% 10 19 32.78%
5 12 15.00% 9 18 42.78%
6 11 21.11% 12 16 51.67%
7 19 31.67% 11 14 59.44%
8 21 43.33% 5 12 66.11%
9 18 53.33% 13 12 72.78%
10 19 63.89% 6 11 78.89%
11 14 71.67% 4 10 84.44%
12 16 80.56% 14 7 88.33%
13 12 87.22% 16 6 91.67%
14 7 91.11% 15 5 94.44%
15 5 93.89% 3 4 96.67%
16 6 97.22% 18 2 97.78%
17 1 97.78% 2 1 98.33%
18 2 98.89% 17 1 98.89%
19 0 98.89% 20 1 99.44%

This distribution has an extreme positive skew.

Find the following display data in columns and indicate your findings in a chart

x¯ =

s =

First quartile =

Median =

70th percentile =

Display frequency rounded to two decimal places
relative frequency
cumulative frequency

Construct a box plot and a histogram displaying your data.
What value is two standard deviations above the mean?
What value is 1.5 standard deviations below the mean?
Compute the P-value for the z test

Explain the distribution of the data
if there are any potential outliers and what values are they
if the middle 50% of the data appear to be concentrated or spread apart.

What is the highest percentage color of M&M in the sample?
Does any color make up more than half of a package?
What about more than a third of the package?
What is the predicatbility that a consumer pick one color over another? What color is it and why?

You can leave a response, or trackback from your own site.

Leave a Reply

Powered by WordPress | Designed by: Premium WordPress Themes | Thanks to Themes Gallery, Bromoney and Wordpress Themes