Self-Assessment RAL R Courses

## About the Self-Assessment

There are three classes at the Research Academy Leipzig: Introduction, Intermediate, and Advanced.

#### For people who have taken R classes at RAL before

If you have taken the previously offered “R Introduction” before and you understood all content, I’d advise that you start with the Intermediate course. If you have taken “R Extended” before and you understood all content, I’d advise that you either start with Intermediate or Advanced. The first day in the Intermediate class will repeat content from “R Extended”, but the second day has new content.

#### For people who have not taken R classes at RAL before

If you do not have any R knowledge, start with the introduction. If you already have some R knowledge and do not know where to start, this self-assessment is intended to indicate for which R classes you should register at the Research Academy Leipzig. No data is shared with your tutor or RAL, so please also consider how confident you felt when answering the questions. If you can answer all questions from the Introduction section, you should start with the Intermediate course. If you can answer all question from the Intermediate section, you should start with the Advanced class.

If you have any questions, get in touch with RAL.

Quiz

## Intro 2: Graphs

A. Recreate this plot: mpg %>% ggplot(aes(hwy,displ)) + _____(method = lm, formula='y ~ x')
mpg %>% ggplot(aes(hwy,displ)) + geom_smooth(method = lm, formula='y ~ x')

B. Recreate this plot: diamonds %>% ____() + geom_bar(_____________)
diamonds %>% ggplot() + geom_bar(aes(color, fill = cut))

Quiz

## Intro 4: Quick Data Insights

Calculate the mean price for the different cuts in the diamonds data set

diamonds %>%
____(cut) %>%
____(mean_price = ____(___))
diamonds %>%
group_by(cut) %>%
summarise(mean_price = mean(price))

## Intermediate 1: More Complex Transformations

world_bank_pop contains the World Bank’s population data from 2000 to 2018. It has the following columns:

Restructure the data, so that: 1. it only contains data for Germany; 2. the years are recorded in a column year; and 3. indicators for the total urban population (SP.URB.TOTL) and the total population (SP.POP.TOTL) are in their own column. The resulting table should look like this:

Here is some code to get you started:

world_bank_pop %>%
filter(country __ "DEU",
indicator ____ c("SP.URB.TOTL", "SP.POP.TOTL")) %>%
pivot_____(____("20"), names_to = "date") %>%
pivot_____(names_from = indicator) %>%
mutate(date = ymd(date, truncated = 2L))
world_bank_pop %>%
filter(country == "DEU",
indicator %in% c("SP.URB.TOTL", "SP.POP.TOTL")) %>%
pivot_longer(contains("20"), names_to = "date") %>%
pivot_wider(names_from = indicator) %>%
mutate(date = ymd(date, truncated = 2L))

## Intermediate 2: Functions

You want to be able to generalise what you did in the exercise before for any country. Write a function that does this and then use the function to generate a line plot for New Zealand’s (NZL) urban population percentage (please create a new variable called URB.PERC for this, before you plot). The plot should look like this: Code to get you started:

pop_extract_func <- function(___) {
world_bank_pop %>%
filter(country == mycountry,
indicator %in% c("SP.URB.TOTL", "SP.POP.TOTL")) %>%
pivot_longer(contains("20"), names_to = "date") %>%
pivot_wider(names_from = indicator) %>%
mutate(date = ymd(date, truncated = 2L))
}

pop_extract_func("NZL") %>%
mutate(URB.PERC = _____) %>%
ggplot() +
geom_line(aes(____, ____))
pop_extract_func <- function(mycountry) {
world_bank_pop %>%
filter(country == mycountry,
indicator %in% c("SP.URB.TOTL", "SP.POP.TOTL")) %>%
pivot_longer(contains("20"), names_to = "date") %>%
pivot_wider(names_from = indicator) %>%
mutate(date = ymd(date, truncated = 2L))
}

pop_extract_func("NZL") %>%
mutate(URB.PERC = (SP.URB.TOTL/SP.POP.TOTL) * 100) %>%
ggplot() +
geom_line(aes(date, URB.PERC))