statistics project
The objective of this project is for you to show your ability to describe data using both graphical and numerical summaries. You will use data you collect from Amazon.com
Amazon.com is the leading web based retailer of products. For more than a decade, Amazon has sold a wide variety of products to millions of customers worldwide. One reason for Amazon.com’s popularity is the fact that potential customers can search, collect information, and compare and evaluate the product before the purchase is made. To help in the evaluation process, Amazon provides two types of information on every product; product ratings and product reviews. Product reviews are textual feedback from prior customers who have purchased the product. Amazon also provides product ratings (on a 5 point scale, 1 star, 2 star….5 star). Below is an example of a product, Sony Bravia 52 inch HDTV, and its ratings. From the screenshot below in Figure 1, it can be observed that 53 out of 70 owners of the TV rated the product at five stars, 5 rated it as 4 stars, etc.
Figure 1. Customer Ratings for Sony XBR9 52 inch HDTV
Data Collection
Amazon.com categorizes its products under various departments, such as Automotive, Baby, Beauty…etc. On the Amazon.com webpage, select one of these departments on the drop down list in the search box area, as shown in the screenshot below in Figure 2. Leave the text box empty, and click the orange “Go” button. Within the department you select, choose at least 50 products (maximum 100) as your random sample, which have at least 30 or more ratings from customers. You may come up with your own method for selecting your sample in the chosen department. For example, if I choose the “Baby”, department, I can then look at all brands, and select the second highest rated product of each brand to constitute my random sample (Do not use the same sampling scheme described here; come up with your own). Download the Excel data collection template posted on the course website, and collect data for your sample, noting the product description, the numbers of 1, 2, 3, 4 and 5 star ratings, the total number of ratings, and the Amazon price. To make retrieving your randomly chosen items easier, you may want to create an Amazon Wish List specifically for this class.
Figure 2. Screenshot of Amazon.com webpage
Statistical Analysis
You have collected data in your data file representing two different sets of data: Customer Ratings and Price. Describe these data including the following information at a minimum: data type (nominal, ordinal, interval or ratio); frequency distribution (see table 3.5 in the book for an example); histogram or bar chart (pick the appropriate one); descriptive statistics, with a description of the shape of the data, the standard deviation and coefficient of variation where appropriate, and the mean, median and mode where appropriate. Offer any additional insights if you wish.
Deliverables (Both due October 4th)
There are two deliverables for this project. First, a copy of your dataset in Excel format should be uploaded to the course website. To do this, click the “Dropbox” tab on the course website, click on “Project 1 Data File”, click the “Add a File” button and follow the instructions. When you’re done, click “Submit.” The name of the file should be your full official name followed by your recitation section number (For example, “JohnDoe_101.xlsx “). Failure to provide the dataset will be considered as a failure to submit the project, and will result in no points being awarded for the project. The second deliverable is a printed hard copy report that describes the study, sample collection, statistical analysis, results, and conclusions. The report should be no longer than 7 pages excluding the title page of the report, and should contain an executive summary section in the beginning. Do not include a printout of your data. Use common fonts for the body of the report, such as Times New Roman or Arial, size 12. Double-spacing is not necessary. All figures and images must have labels with description.
Please include a signed honor code pledge at the end of the project report. This report is to be completed individually with assistance from your instructor or recitation leader, only. Collaboration with a classmate is considered a violation of the honor code. A 30% match between samples of two students, for example, is unlikely to happen at random, and will be considered as evidence of cheating. Cheating will result in failure of the project, and submission to the CU Honor Code Council.