# Introduction to Statistics

1 Due date
The assignment is due on 19 February 2020 at the beginning of class. Start in time. If you run into trouble, ask me for help.

2 Objectives
The objectives of this assignment are:
– to test your understanding of basic statistical concepts discussed in assigned readings;
– to make you acquainted with some key data sources;
– to make you acquainted with Stata, and use it to generate graphical summaries and descriptive statistics for a dataset;
– to let you practice interpreting the results you obtained.

Don't use plagiarized sources. Get Your Custom Essay on
Introduction to Statistics
Just from \$13/Page

3 Details
You should do the homework assignment (preferably) in teams of two. Please note that both team members should contribute equally to the assignment and both team members are responsible for the quality of the entire assignment. If you have problems co-operating with your partner, let me know immediately. The whole assignment has to be submitted at once by one team member and the feedback will be sent back only to that team member. It is the responsibility of the team members to communicate and clarify any issues, including my feedback, among themselves.
The assignment consists of two parts: (A) solving a set of five exercises from the textbook, and (B) analysis of a real dataset downloaded by the students and approved by me. That is, you need to send the dataset to me before you start with the analysis. If you don’t have the dataset approved by me in advance, you risk getting no credit for the work done. It is particularly important that both team members contribute to writing the Stata code in part (B). It is highly advisable to keep a proof of your work in form of a Stata log file, just in case there is disagreement between the team members regarding the work done. In case of disagreement, the team member who has no written proof of the work done risks getting no credit.

Part A
First read pp. 7-78 of your textbook. Then solve the five exercises that are assigned to your team below. Please note that to get full credit for solution of the exercises, you have to show all intermediate steps.
Team 1: 1.4, 1.24, 1.36, 2.2, 2.28
Team 2: 1.2, 1.6, 1.42, 2.6, 2.30
Team 3: 1.16, 1.18, 1.28, 2.4, 2.16
Team 4: 1.8, 1.20, 1.30, 1.38, 2.20
Team 5: 1.12, 1.32, 1.40, 2.12, 2.32
Team 6: 1.14, 1.26, 1.42, 2.8, 2.14
Team 7: 1.10, 1.22, 1.34, 2.10, 2.18
Team 8: 1.6, 1.30, 2.4, 2.22, 2.34

Part B:
Step 1
Download a dataset on a subject that interests you and email it to me for approval before you start with the analysis. The dataset should have at least 50 observations and at least two numerical variables. Don’t limit your population of interest to 50 observations just because I required a data set of at least 50 observations; e.g., if you use countries, use all countries for which the data exist. Make sure you understand the meaning of the variables that you use in your analysis. Here are some examples of data sources that you could consult:
The Dataverse project (https://dataverse.harvard.edu/). The open source research data repository contains thousands of datasets collected for research purposes. You can search the database by using terms of your interest: e.g. ‘United Nations’ or ‘firm’. Make sure that there are readily available data associated with your chosen dataset.
Journal data archives. Many journals are participating in the open data movement and are providing access to datasets of published papers. See, e.g., the Journal of Applied Econometrics data archive. You are encouraged to read and browse in the leading journals of your discipline and identify possible datasets of your interest.
Sources of macro-level data. The performance for a recent year of all countries of the world on some social, economic, or demographic measures (GDP, population, total imports, total exports, child mortality, unemployment rate, inﬂation rate, etc.). Make sure you understand the meaning of the variables: e.g., don’t choose gross domestic product (GDP) if you don’t know what gross domestic product means. Make sure the data are comparable. For example, don’t use GDP in national currencies for all countries of the world, because in that case Afghanistan’s GDP will be measured in Afghani, Albania’s GDP in Lek, etc., and the numbers will be incomparable; use GDP expressed in a common currency (like the US dollar) instead. The data should be cross-section (measured in a given period or at a point in time). Don’t use time series data, that is, data where the cases are subsequent ﬁxed periods of time (such as annual GDP, 1950–2015). Some sources are:
– World Bank, World Development Indicators (http://databank.worldbank.org/data/reports.aspx?source=world-development-indicators)
– Gapminder (https://www.gapminder.org/data/)
– The Penn World Table (https://www.rug.nl/ggdc/productivity/pwt/)
– United Nations, Human Development Report (http://hdr.undp.org/en/)
– United Nations Statistics Division (http://unstats.un.org/unsd/default.htm)
– OECD Data (https://data.oecd.org/)
Other data sources. You are allowed to search for other sources of interesting datasets to analyze. Just remember to email the dataset to me for approval before you start with the analysis. Some sources are:
– www.kaggle.com
– http://koaning.io/fun-datasets.html
– https://www.dataquest.io/blog/free-datasets-for-projects/

Ideally, your dataset should contain both numerical and categorical variables. In case your dataset contains only numerical variables, you will need to transform some of them into categorical (ordinal) variables: you could consult the following url:

How can I recode continuous variables into groups? | Stata FAQ

Step 2
Write a paragraph about the chosen data source, chosen dataset, and chosen variables. Explain why you chose those variables, i.e., why should the reader be interested in your analysis.

Step 3
If your chosen dataset is not in Stata format, start Stata and import your data from the data ﬁle. Inspect your data to see if they were correctly imported.

Step 4
Use Stata to provide the statistical output listed below. After each exercise, you should interpret the results by focusing on the questions and guidelines provided. Try to come up with a “story” behind your results by answering the following questions: Are the results as expected based on your knowledge of the literature or based on your intuition? In case the results are not as expected, what could be the possible reasons?
a. Choose two numerical variables and generate a scatterplot. Interpret the graph by focusing on the following questions: Are the variables associated or independent? What type of a relationship, if any, do you observe? Are there any unusual cases?
b. Choose a numerical variable and generate a histogram. Describe the distribution of the variable. The description should incorporate the center, variability, and shape of the distribution. A good description of the shape of a distribution should include modality and whether the distribution is symmetric or skewed. Also note any unusual cases.
c. For the numerical variable chosen in (b), generate a boxplot. Do the histogram and boxplot tell the same story about the distribution of the variable? What additional information can you see in the boxplot that you couldn’t see in the histogram?
d. For the numerical variable chosen in (b), generate descriptive statistics. At the minimum, you should calculate the mean, the median, and the standard deviation. Interpret the descriptive statistics by looking at, e.g., how the mean relates to the median and how the standard deviation relates to the mean.
e. Choose a categorical variable and generate a bar plot or a pie chart. Interpret the graph by comparing group sizes.
f. Choose one numerical and one categorical variable, and generate a side-by-side boxplot. Interpret the graph by comparing the numerical data across groups of the categorical variable.

Step 5
Email me the Stata datafile, syntax and statistical output, together with a word file containing interpretation of the statistical results. You should copy and paste to the word file the parts of the statistical output that you are interpreting. Please do not submit pdf files. I will comment on your assignments using Review/Track Changes and Review/New Comment in Word, so please make sure that you have these tools enabled to review my feedback.

275 words
Total price: \$0.00

## Top-quality papers guaranteed

### 100% original papers

We sell only unique pieces of writing completed according to your demands.

### Confidential service

We use security encryption to keep your personal data protected.

### Money-back guarantee

We can give your money back if something goes wrong with your order.

## Get free features with our reliable essay writing service

1. Title page

We offer you a free title page tailored according to the specifics of your particular style.

2. Custom formatting

Include your preferred formatting style when you order from us to accompany your paper.

3. Bibliography page

Get a list of references to go with your ordered paper.

Type of paper