Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions inf-model-mlr.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Recall the `loans` data from [Chapter -@sec-model-mlr].

::: {.data data-latex=""}
The [`loans_full_schema`](http://openintrostat.github.io/openintro/reference/loans_full_schema.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
Based on the data in this dataset we have created two new variables: `credit_util` which is calculated as the total credit utilized divided by the total credit limit and `bankruptcy` which turns the number of bankruptcies to an indicator variable (0 for no bankruptcies and 1 for at least 1 bankruptcies).
Based on the data in this dataset we have created two new variables: `credit_util` which is calculated as the total credit utilized divided by the total credit limit and `bankruptcy` which turns the number of bankruptcies to an indicator variable (0 for no bankruptcies and 1 for at least 1 bankruptcy).
We will refer to this modified dataset as `loans`.
:::

Expand Down Expand Up @@ -159,7 +159,7 @@ We also note that the total `number_of_coins` and the `number_of_low_coins` are
#| fig-cap: |
#| Two plots describing the total amount of money (USD) as a function of the
#| total number of coins or low coins. As you might expect, the total amount
#| of money is more highly postively correlated with the total number of coins
#| of money is more highly positively correlated with the total number of coins
#| than with the number of low coins.
#| fig-subcap:
#| - Total number of coins on the x-axis.
Expand Down Expand Up @@ -417,7 +417,7 @@ terms_chp_25 <- c(terms_chp_25, "cross-validation", "prediction error")
```

::: {.data data-latex=""}
The [`penguins`](https://allisonhorst.github.io/palmerpenguins/articles/intro.html) data can be found in the [**palmerpenguings**](https://github.com/allisonhorst/palmerpenguins) R package.
The [`penguins`](https://allisonhorst.github.io/palmerpenguins/articles/intro.html) data can be found in the [**palmerpenguins**](https://github.com/allisonhorst/palmerpenguins) R package.
:::

Our goal in this section is to compare two different regression models which both seek to predict the mass of an individual penguin in grams.
Expand All @@ -444,9 +444,9 @@ Cross-validation is one way to get accurate independent predictions with which t

The question we will seek to answer is whether the predictions of `body_mass_g` are substantially better when `bill_length_mm`, `bill_depth_mm`, `flipper_length_mm`, `sex`, and `species` are used in the model, as compared with a model on `bill_length_mm` only.

We refer to the model given with only `bill_lengh_mm` as the **smaller** model.
We refer to the model given with only `bill_length_mm` as the **smaller** model.
It is seen in @tbl-peng-lm-bill with coefficient estimates of the parameters as well as standard errors and p-values.
We refer to the model given with `bill_lengh_mm`, `bill_depth_mm`, `flipper_length_mm`, `sex`, and `species` as the **larger** model.
We refer to the model given with `bill_length_mm`, `bill_depth_mm`, `flipper_length_mm`, `sex`, and `species` as the **larger** model.
It is seen in @tbl-peng-lm-all with coefficient estimates of the parameters as well as standard errors and p-values.
Given what we know about high correlations between body measurements, it is somewhat unsurprising that all of the variables have low p-values, suggesting that each variable is a statistically discernible predictor of `body_mass_g`, given all other variables in the model.
However, in this section, we will go beyond the use of p-values to consider independent predictions of `body_mass_g` as a way to compare the smaller and larger models.
Expand Down
Loading