Modeling in the Tidyverse

---

## Modeling in the Tidyverse

### Applications of Data Science - Class 5

### Giora Simchoni

#### `gsimchoni@gmail.com and add #dsapps in subject`

### Stat. and OR Department, TAU
### 2023-03-26

---

<div class="my-footer">
  <span>
    <a href="https://dsapps-2023.github.io/Class_Slides/" target="_blank">Applications of Data Science
    </a>
  </span>
</div>

---

# The Problem

---

### Inconsistency, Inextensibility

```r
n <- 10000
x1 <- runif(n)
x2 <- runif(n)
t <- 1 + 2 * x1 + 3 * x2
y <- rbinom(n, 1, 1 / (1 + exp(-t)))
```

```r
glm(y ~ x1 + x2, family = "binomial")
```

```r
glmnet(as.matrix(cbind(x1, x2)), as.factor(y), family = "binomial")
```

```r
randomForest(as.factor(y) ~ x1 + x2)
```

```r
gbm(y ~ x1 + x2, data = data.frame(x1 = x1, x2 = x2, y = y))
```

😱

---

### Compare this with `sklearn`

```python
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier,
  GradientBoostingClassifier

LogisticRegression(penalty='none').fit(X, y)

LogisticRegression(penalty='l2', C=0.001).fit(X, y)

RandomForestClassifier(n_estimators=100).fit(X, y)

GradientBoostingClassifier(n_estimators=100).fit(X, y)
```

---

# Detour: A Regression Problem

---

### Hungarian Blogs: Predicting Feedback

- Dataset was published as part of the [UCI ML Repository](https://archive.ics.uci.edu/ml/datasets/BlogFeedback) initiative
- Comes from [Buza 2014](http://www.cs.bme.hu/~buza/pdfs/gfkl2012_blogs.pdf)
- 280 numeric heavily engineered features on blogs and posts published in the last 72 hours
- Can we predict no. of comments in the next 24 hours?

---

The raw data has over 50K rows: for each blog features like total comments until base time, weekday, words, etc.

We will be predicting `log(fb)` based on all features, no missing values:

```r
blogs_fb <- read_csv("~/BlogFeedback/blogData_train.csv", col_names = FALSE)

blogs_fb <- blogs_fb %>%
  rename(fb = X281, blog_len = X62, sunday = X276) %>%
  mutate(fb = log(fb + 1))

dim(blogs_fb)
```

```
## [1] 52397   281
```

---

```r
glimpse(blogs_fb)
```

```
## Rows: 52,397
## Columns: 281
## $ X1       <dbl> 40.30467, 40.30467, 40.30467, 40.30467, 40.30467, 40.30467, 4…
## $ X2       <dbl> 53.84566, 53.84566, 53.84566, 53.84566, 53.84566, 53.84566, 5…
## $ X3       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X4       <dbl> 401, 401, 401, 401, 401, 401, 401, 401, 401, 401, 401, 401, 4…
## $ X5       <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 1…
## $ X6       <dbl> 15.52416, 15.52416, 15.52416, 15.52416, 15.52416, 15.52416, 1…
## $ X7       <dbl> 32.44188, 32.44188, 32.44188, 32.44188, 32.44188, 32.44188, 3…
## $ X8       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X9       <dbl> 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 3…
## $ X10      <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ X11      <dbl> 14.04423, 14.04423, 14.04423, 14.04423, 14.04423, 14.04423, 1…
## $ X12      <dbl> 32.61542, 32.61542, 32.61542, 32.61542, 32.61542, 32.61542, 3…
## $ X13      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X14      <dbl> 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 3…
## $ X15      <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ X16      <dbl> 34.56757, 34.56757, 34.56757, 34.56757, 34.56757, 34.56757, 3…
## $ X17      <dbl> 48.47518, 48.47518, 48.47518, 48.47518, 48.47518, 48.47518, 4…
## $ X18      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X19      <dbl> 378, 378, 378, 378, 378, 378, 378, 378, 378, 378, 378, 378, 3…
## $ X20      <dbl> 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 1…
## $ X21      <dbl> 1.479934, 1.479934, 1.479934, 1.479934, 1.479934, 1.479934, 1…
## $ X22      <dbl> 46.18691, 46.18691, 46.18691, 46.18691, 46.18691, 46.18691, 4…
## $ X23      <dbl> -356, -356, -356, -356, -356, -356, -356, -356, -356, -356, -…
## $ X24      <dbl> 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 377, 3…
## $ X25      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X26      <dbl> 1.076167, 1.076167, 1.076167, 1.076167, 1.076167, 1.076167, 1…
## $ X27      <dbl> 1.795416, 1.795416, 1.795416, 1.795416, 1.795416, 1.795416, 1…
## $ X28      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X29      <dbl> 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 1…
## $ X30      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X31      <dbl> 0.4004914, 0.4004914, 0.4004914, 0.4004914, 0.4004914, 0.4004…
## $ X32      <dbl> 1.078097, 1.078097, 1.078097, 1.078097, 1.078097, 1.078097, 1…
## $ X33      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X34      <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9…
## $ X35      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X36      <dbl> 0.3775594, 0.3775594, 0.3775594, 0.3775594, 0.3775594, 0.3775…
## $ X37      <dbl> 1.07421, 1.07421, 1.07421, 1.07421, 1.07421, 1.07421, 1.07421…
## $ X38      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X39      <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9…
## $ X40      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X41      <dbl> 0.972973, 0.972973, 0.972973, 0.972973, 0.972973, 0.972973, 0…
## $ X42      <dbl> 1.704671, 1.704671, 1.704671, 1.704671, 1.704671, 1.704671, 1…
## $ X43      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X44      <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
## $ X45      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X46      <dbl> 0.02293202, 0.02293202, 0.02293202, 0.02293202, 0.02293202, 0…
## $ X47      <dbl> 1.521174, 1.521174, 1.521174, 1.521174, 1.521174, 1.521174, 1…
## $ X48      <dbl> -8, -8, -8, -8, -8, -8, -8, -8, -8, -8, -8, -8, -8, -8, -8, -…
## $ X49      <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9…
## $ X50      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X51      <dbl> 2, 6, 6, 2, 3, 6, 6, 3, 30, 30, 0, 0, 30, 0, 51, 10, 32, 10, …
## $ X52      <dbl> 2, 2, 2, 2, 1, 0, 0, 1, 27, 27, 0, 0, 30, 0, 51, 10, 2, 10, 5…
## $ X53      <dbl> 0, 4, 4, 0, 2, 2, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 30, 0, 0, 51,…
## $ X54      <dbl> 2, 5, 5, 2, 2, 5, 5, 2, 2, 2, 0, 0, 30, 0, 51, 10, 32, 10, 51…
## $ X55      <dbl> 2, -2, -2, 2, -1, -2, -2, -1, 26, 26, 0, 0, 30, 0, 51, 10, -2…
## $ X56      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 1, 2, 3, 0, 2, 0, 3, 4, 0…
## $ X57      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 3, 0, 1, 0, 3, 1, 0…
## $ X58      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 3, 0…
## $ X59      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 1, 2, 3, 0, 2, 0, 3, 4, 0…
## $ X60      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, -2, 1, 0, 3, 0, 0, 0, 3, -2,…
## $ X61      <dbl> 10, 35, 35, 10, 34, 59, 59, 34, 58, 58, 11, 35, 5, 59, 8, 14,…
## $ blog_len <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X63      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X64      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X65      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X66      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X67      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X68      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X69      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X70      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X71      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X72      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X73      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X74      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X75      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X76      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X77      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X78      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X79      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X80      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X81      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X82      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X83      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X84      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X85      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X86      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X87      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X88      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X89      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X90      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X91      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X92      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X93      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X94      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X95      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X96      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X97      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X98      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X99      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X100     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X101     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X102     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X103     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X104     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X105     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X106     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X107     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X108     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X109     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X110     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X111     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X112     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X113     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X114     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X115     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X116     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X117     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X118     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X119     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X120     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X121     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X122     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X123     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X124     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X125     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X126     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X127     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X128     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X129     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X130     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X131     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X132     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X133     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X134     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X135     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X136     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X137     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X138     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X139     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X140     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X141     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X142     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X143     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X144     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X145     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X146     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X147     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X148     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X149     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X150     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X151     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X152     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X153     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X154     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X155     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X156     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X157     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X158     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X159     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X160     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X161     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X162     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X163     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X164     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X165     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X166     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X167     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X168     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X169     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X170     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X171     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X172     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X173     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X174     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X175     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X176     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X177     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X178     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X179     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X180     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X181     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X182     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X183     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X184     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X185     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X186     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X187     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X188     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X189     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X190     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X191     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X192     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X193     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X194     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X195     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X196     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X197     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X198     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X199     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X200     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X201     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X202     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X203     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X204     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X205     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X206     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X207     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X208     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X209     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X210     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X211     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X212     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X213     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X214     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X215     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X216     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X217     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X218     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X219     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X220     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X221     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X222     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X223     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X224     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X225     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X226     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X227     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X228     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X229     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X230     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X231     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X232     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X233     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X234     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X235     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X236     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X237     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X238     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X239     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X240     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X241     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X242     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X243     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X244     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X245     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X246     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X247     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X248     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X249     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X250     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X251     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X252     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X253     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X254     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X255     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X256     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X257     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X258     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X259     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X260     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X261     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X262     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X263     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X264     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0…
## $ X265     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0…
## $ X266     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1…
## $ X267     <dbl> 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X268     <dbl> 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X269     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X270     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0…
## $ X271     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1…
## $ X272     <dbl> 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X273     <dbl> 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X274     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X275     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ sunday   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X277     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X278     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X279     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ X280     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ fb       <dbl> 0.6931472, 0.0000000, 0.0000000, 0.6931472, 3.3322045, 0.0000…
```

---

See the dependent variable distribution:

```r
ggplot(blogs_fb, aes(fb)) +
  geom_histogram(fill = "red", alpha = 0.5, binwidth = 0.5) +
  theme_bw()
```

---

See it vs. say "length of post":

```r
ggplot(blogs_fb, aes(log(blog_len), fb)) +
  geom_point(color = "red", alpha = 0.5) +
  theme_bw()
```

---

# End of Detour

---

# WARNING

This is just an intro to tidy modeling.

Some of the issues with how things are done here will be raised, some will have to wait till later in the course.
]
---

# The ~~Present~~ Past Solution: `caret`

---

### Split Data

```r
library(caret)

train_idx <- createDataPartition(blogs_fb$fb,
                                 p = 0.6, list = FALSE)

blogs_tr <- blogs_fb[train_idx, ]
blogs_te <- blogs_fb[-train_idx, ]

library(glue)
glue("train no. of rows: {nrow(blogs_tr)}
     test no. of rows: {nrow(blogs_te)}")
```

```
## train no. of rows: 31439
## test no. of rows: 20958
```

Here you might consider some preprocessing.

`caret` has some nice documentation [here](https://topepo.github.io/caret/index.html).

---

### Tuning and Modeling

Define general methodology, e.g. 5-fold Cross-Validation:

```r
fit_control <- trainControl(method = "cv", number = 5)

ridge_grid <- expand.grid(alpha=0, lambda = 10^seq(-3, 1, length = 50))
lasso_grid <- expand.grid(alpha=1, lambda = 10^seq(-3, 1, length = 50))
rf_grid <- expand.grid(splitrule = "variance",
                       min.node.size = seq(10, 30, 10),
                       mtry = seq(10, 50, 20))

mod_ridge <- train(fb ~ ., data = blogs_tr, method = "glmnet",
                trControl = fit_control, tuneGrid = ridge_grid,
                metric = "RMSE")

mod_lasso <- train(fb ~ ., data = blogs_tr, method = "glmnet",
                trControl = fit_control, tuneGrid = lasso_grid,
                metric = "RMSE")

mod_rf <- train(fb ~ ., data = blogs_tr, method = "ranger",
                trControl = fit_control, tuneGrid = rf_grid,
                num.trees = 50, metric = "RMSE")
```

---

### Evaluating Models

```r
mod_ridge
```

```
## glmnet 
## 
## 31439 samples
##   280 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 25151, 25150, 25152, 25151, 25152 
## Resampling results across tuning parameters:
## 
##   lambda        RMSE       Rsquared   MAE      
##    0.001000000  0.8465427  0.4432868  0.5902339
##    0.001206793  0.8465427  0.4432868  0.5902339
##    0.001456348  0.8465427  0.4432868  0.5902339
##    0.001757511  0.8465427  0.4432868  0.5902339
##    0.002120951  0.8465427  0.4432868  0.5902339
##    0.002559548  0.8465427  0.4432868  0.5902339
##    0.003088844  0.8465427  0.4432868  0.5902339
##    0.003727594  0.8465427  0.4432868  0.5902339
##    0.004498433  0.8465427  0.4432868  0.5902339
##    0.005428675  0.8465427  0.4432868  0.5902339
##    0.006551286  0.8465427  0.4432868  0.5902339
##    0.007906043  0.8465427  0.4432868  0.5902339
##    0.009540955  0.8465427  0.4432868  0.5902339
##    0.011513954  0.8465427  0.4432868  0.5902339
##    0.013894955  0.8465427  0.4432868  0.5902339
##    0.016768329  0.8465427  0.4432868  0.5902339
##    0.020235896  0.8465427  0.4432868  0.5902339
##    0.024420531  0.8465427  0.4432868  0.5902339
##    0.029470517  0.8465427  0.4432868  0.5902339
##    0.035564803  0.8465427  0.4432868  0.5902339
##    0.042919343  0.8465427  0.4432868  0.5902339
##    0.051794747  0.8465427  0.4432868  0.5902339
##    0.062505519  0.8467117  0.4430782  0.5903658
##    0.075431201  0.8479974  0.4414519  0.5913602
##    0.091029818  0.8493544  0.4397435  0.5923965
##    0.109854114  0.8507596  0.4379886  0.5934634
##    0.132571137  0.8522240  0.4361799  0.5945637
##    0.159985872  0.8537427  0.4343320  0.5956963
##    0.193069773  0.8553299  0.4324376  0.5968782
##    0.232995181  0.8570035  0.4304862  0.5981506
##    0.281176870  0.8588006  0.4284420  0.5995849
##    0.339322177  0.8607655  0.4262615  0.6012393
##    0.409491506  0.8629367  0.4239091  0.6031958
##    0.494171336  0.8653867  0.4212996  0.6055008
##    0.596362332  0.8681789  0.4183521  0.6082562
##    0.719685673  0.8713638  0.4150067  0.6115633
##    0.868511374  0.8749957  0.4111848  0.6154977
##    1.048113134  0.8791111  0.4068194  0.6201124
##    1.264855217  0.8837361  0.4018633  0.6253678
##    1.526417967  0.8888746  0.3962879  0.6311635
##    1.842069969  0.8945103  0.3900955  0.6373592
##    2.222996483  0.9006079  0.3833192  0.6438697
##    2.682695795  0.9071214  0.3760195  0.6506244
##    3.237457543  0.9139965  0.3682851  0.6575705
##    3.906939937  0.9211772  0.3602281  0.6647188
##    4.714866363  0.9286122  0.3519779  0.6719947
##    5.689866029  0.9362606  0.3436749  0.6793779
##    6.866488450  0.9440955  0.3354611  0.6868500
##    8.286427729  0.9521037  0.3274654  0.6943627
##   10.000000000  0.9602856  0.3197914  0.7019162
## 
## Tuning parameter 'alpha' was held constant at a value of 0
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0 and lambda = 0.05179475.
```

---

```r
mod_lasso
```

```
## glmnet 
## 
## 31439 samples
##   280 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 25152, 25151, 25150, 25151, 25152 
## Resampling results across tuning parameters:
## 
##   lambda        RMSE       Rsquared   MAE      
##    0.001000000  0.8363520  0.4557473  0.5808941
##    0.001206793  0.8367221  0.4552621  0.5812621
##    0.001456348  0.8373006  0.4545144  0.5818254
##    0.001757511  0.8378683  0.4537838  0.5823994
##    0.002120951  0.8384855  0.4529970  0.5830077
##    0.002559548  0.8393089  0.4519476  0.5837321
##    0.003088844  0.8405302  0.4503812  0.5847438
##    0.003727594  0.8422647  0.4481496  0.5861356
##    0.004498433  0.8441916  0.4456619  0.5876737
##    0.005428675  0.8461175  0.4431796  0.5890034
##    0.006551286  0.8481192  0.4405938  0.5903564
##    0.007906043  0.8502745  0.4378111  0.5918695
##    0.009540955  0.8527639  0.4345716  0.5936910
##    0.011513954  0.8548075  0.4319568  0.5952619
##    0.013894955  0.8572912  0.4287521  0.5971848
##    0.016768329  0.8595876  0.4258242  0.5989684
##    0.020235896  0.8612785  0.4237738  0.6005154
##    0.024420531  0.8624864  0.4224979  0.6017381
##    0.029470517  0.8639204  0.4210409  0.6031626
##    0.035564803  0.8657003  0.4192847  0.6051093
##    0.042919343  0.8678262  0.4173131  0.6076622
##    0.051794747  0.8703560  0.4151486  0.6109852
##    0.062505519  0.8735505  0.4125224  0.6154668
##    0.075431201  0.8773779  0.4097051  0.6210429
##    0.091029818  0.8811636  0.4081723  0.6271661
##    0.109854114  0.8859656  0.4069365  0.6348774
##    0.132571137  0.8928400  0.4049931  0.6445753
##    0.159985872  0.9027495  0.4016010  0.6567481
##    0.193069773  0.9169912  0.3953602  0.6720417
##    0.232995181  0.9373524  0.3830465  0.6913175
##    0.281176870  0.9656148  0.3583346  0.7152676
##    0.339322177  0.9925805  0.3498912  0.7358105
##    0.409491506  1.0281981  0.3387926  0.7604651
##    0.494171336  1.0755751  0.2972423  0.7912722
##    0.596362332  1.1271288  0.2887711  0.8202405
##    0.719685673  1.1336226        NaN  0.8238043
##    0.868511374  1.1336226        NaN  0.8238043
##    1.048113134  1.1336226        NaN  0.8238043
##    1.264855217  1.1336226        NaN  0.8238043
##    1.526417967  1.1336226        NaN  0.8238043
##    1.842069969  1.1336226        NaN  0.8238043
##    2.222996483  1.1336226        NaN  0.8238043
##    2.682695795  1.1336226        NaN  0.8238043
##    3.237457543  1.1336226        NaN  0.8238043
##    3.906939937  1.1336226        NaN  0.8238043
##    4.714866363  1.1336226        NaN  0.8238043
##    5.689866029  1.1336226        NaN  0.8238043
##    6.866488450  1.1336226        NaN  0.8238043
##    8.286427729  1.1336226        NaN  0.8238043
##   10.000000000  1.1336226        NaN  0.8238043
## 
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.001.
```

---

```r
mod_rf
```

```
## Random Forest 
## 
## 31439 samples
##   280 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 25151, 25151, 25151, 25151, 25152 
## Resampling results across tuning parameters:
## 
##   min.node.size  mtry  RMSE       Rsquared   MAE      
##   10             10    0.6971449  0.6287011  0.4482854
##   10             30    0.6548617  0.6675777  0.4092333
##   10             50    0.6483229  0.6734382  0.4020972
##   20             10    0.7028311  0.6237031  0.4534763
##   20             30    0.6603183  0.6624895  0.4153412
##   20             50    0.6513949  0.6706486  0.4069590
##   30             10    0.7077821  0.6187225  0.4578572
##   30             30    0.6643416  0.6589521  0.4199686
##   30             50    0.6520327  0.6703291  0.4093365
## 
## Tuning parameter 'splitrule' was held constant at a value of variance
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were mtry = 50, splitrule = variance
##  and min.node.size = 10.
```

---

```r
plot(mod_ridge)
```

---

```r
plot(mod_lasso)
```

---

```r
plot(mod_rf)
```

---

### Comparing Models

```r
resamps <- resamples(list(Ridge = mod_ridge, Lasso = mod_lasso,
                          RF = mod_rf))
summary(resamps)
```

```
## 
## Call:
## summary.resamples(object = resamps)
## 
## Models: Ridge, Lasso, RF 
## Number of resamples: 5 
## 
## MAE 
##            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## Ridge 0.5750361 0.5893380 0.5917179 0.5902339 0.5968475 0.5982301    0
## Lasso 0.5681938 0.5797615 0.5820657 0.5808941 0.5844702 0.5899792    0
## RF    0.3943710 0.3977928 0.4030565 0.4020972 0.4055509 0.4097149    0
## 
## RMSE 
##            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## Ridge 0.8332790 0.8425411 0.8447478 0.8465427 0.8465174 0.8656284    0
## Lasso 0.8084833 0.8358103 0.8385322 0.8363520 0.8402470 0.8586870    0
## RF    0.6364601 0.6371085 0.6489900 0.6483229 0.6562275 0.6628281    0
## 
## Rsquared 
##            Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## Ridge 0.4221219 0.4237722 0.4483357 0.4432868 0.4501113 0.4720927    0
## Lasso 0.4396015 0.4483588 0.4517731 0.4557473 0.4693071 0.4696961    0
## RF    0.6579006 0.6708609 0.6718423 0.6734382 0.6809223 0.6856649    0
```

---

```r
dotplot(resamps, metric = "RMSE")
```

---

### Predicting

```r
pred_ridge <- predict(mod_ridge, newdata = blogs_te)
pred_lasso <- predict(mod_lasso, newdata = blogs_te)
pred_rf <- predict(mod_rf, newdata = blogs_te)

rmse_ridge <- RMSE(pred_ridge, blogs_te$fb)
rmse_lasso <- RMSE(pred_lasso, blogs_te$fb)
rmse_rf <- RMSE(pred_rf, blogs_te$fb)

glue("Test RMSE Ridge: {format(rmse_ridge, digits = 3)}
     Test RMSE Lassoe: {format(rmse_lasso, digits = 3)}
     Test RMSE RF: {format(rmse_rf, digits = 3)}")
```

```
## Test RMSE Ridge: 0.846
## Test RMSE Lassoe: 0.834
## Test RMSE RF: 0.648
```

---

```r
bind_rows(
  tibble(method = "Ridge", pred = pred_ridge, true = blogs_te$fb),
  tibble(method = "Lasso", pred = pred_lasso, true = blogs_te$fb),
  tibble(method = "RF", pred = pred_rf, true = blogs_te$fb)) %>%
  ggplot(aes(true, pred)) +
  geom_point(color = "red", alpha = 0.5) +
  geom_abline(slope = 1, intercept = 0) +
  facet_wrap(~ factor(method, levels = c("Ridge", "Lasso", "RF"))) +
  theme_bw()
```

---

# The ~~Future~~ Present Solution: `tidymodels`

#### Inspired by [Julia Silge](https://juliasilge.com/blog/intro-tidymodels/)

---

### Packages under tidymodels

- `parsnip`: tidy `caret`
- `dials` and `tune`: specifying and tuning model parameters
- `rsample`: sampling, data partitioning
- `recipes`, `embed`, `themis`: preprocessing and creating model matrices
- `infer`: tidy statistics
- `yardstick`: measuring models performance
- `broom`: convert models output into tidy tibbles

And [more](https://www.tidymodels.org/).

---

### Split Data

The `initial_split()` function is from the `rsample` package:

```r
library(tidymodels)

blogs_split_obj <- blogs_fb %>%
  initial_split(prop = 0.6)

print(blogs_split_obj)
```

```
## <Training/Testing/Total>
## <31438/20959/52397>
```

```r
blogs_tr <- training(blogs_split_obj)
blogs_te <- testing(blogs_split_obj)

glue("train no. of rows: {nrow(blogs_tr)}
     test no. of rows: {nrow(blogs_te)}")
```

```
## train no. of rows: 31438
## test no. of rows: 20959
```

---

### Preprocess

The `recipe()` function is from the `recipes` package. It allows you to specify a preprocessing pipeline you can later apply to any dataset, including multiple steps:

```r
blogs_rec <- recipe(fb ~ ., data = blogs_tr)
blogs_rec
```

```
## 
```

```
## ── Recipe ──────────────────────────────────────────────────────────────────────
```

```
## 
```

```
## ── Inputs
```

```
## Number of variables by role
```

```
## outcome:     1
## predictor: 280
```

---

The `recipes` package contains more preprocessing [`step_`s](https://tidymodels.github.io/recipes/reference/index.html) than you imagine:

```r
blogs_rec <-  blogs_rec %>%
  step_zv(all_numeric_predictors()) %>%
  step_normalize(all_numeric_predictors())
```

---

After you have your `recipe` you need to `prep()` materials...

```r
blogs_rec <- blogs_rec %>% prep(blogs_tr)

blogs_rec
```

```
## 
```

```
## ── Inputs
```

```
## Number of variables by role
```

```
## outcome:     1
## predictor: 280
```

```
## 
```

```
## ── Training information
```

```
## Training data contained 31438 data points and no incomplete rows.
```

```
## 
```

```
## ── Operations
```

```
## • Zero variance filter removed: X13, X33, X38, X278 | Trained
```

```
## • Centering and scaling for: X1, X2, X3, X4, X5, X6, X7, X8, X9, ... | Trained
```

---

At this point our `recipe` has all necessary `sd` and `mean`s for numeric variables.

```r
blogs_rec$var_info
```

```
## # A tibble: 281 × 4
##    variable type      role      source  
##    <chr>    <list>    <chr>     <chr>   
##  1 X1       <chr [2]> predictor original
##  2 X2       <chr [2]> predictor original
##  3 X3       <chr [2]> predictor original
##  4 X4       <chr [2]> predictor original
##  5 X5       <chr [2]> predictor original
##  6 X6       <chr [2]> predictor original
##  7 X7       <chr [2]> predictor original
##  8 X8       <chr [2]> predictor original
##  9 X9       <chr [2]> predictor original
## 10 X10      <chr [2]> predictor original
## # … with 271 more rows
```

```r
blogs_rec$var_info$type[[1]]
```

```
## [1] "double"  "numeric"
```
---

```r
blogs_rec$steps[[2]]$means |> head()
```

```
##          X1          X2          X3          X4          X5          X6 
##  39.6974280  47.0002493   0.3098798 341.4913162  24.9267288  15.3097213
```

```r
blogs_rec$steps[[2]]$sds |> head()
```

```
##         X1         X2         X3         X4         X5         X6 
##  79.095448  62.569848   3.820566 443.854810  69.354476  32.256680
```

---

And then we `bake()` (or [`juice()`](https://tidymodels.github.io/recipes/reference/juice.html)):

```r
blogs_tr2 <- blogs_rec %>% bake(blogs_tr)
blogs_te2 <- blogs_rec %>% bake(blogs_te)

glue("mean of comments in orig training: {format(mean(blogs_tr$X51), digits = 3)}, sd: {format(sd(blogs_tr$X51), digits = 3)}
     mean of comments in baked training: {format(mean(blogs_tr2$X51), digits = 1)}, sd: {format(sd(blogs_tr2$X51), digits = 3)}")
```

```
## mean of comments in orig training: 39.8, sd: 112
## mean of comments in baked training: 2e-17, sd: 1
```

```r
glue("mean of comments in orig testing: {format(mean(blogs_te$X51), digits = 3)}, sd: {format(sd(blogs_te$X51), digits = 3)}
     mean of comments in baked testing: {format(mean(blogs_te2$X51), digits = 1)}, sd: {format(sd(blogs_te2$X51), digits = 3)}")
```

```
## mean of comments in orig testing: 38.9, sd: 110
## mean of comments in baked testing: -0.008, sd: 0.977
```

---

Or you can do it all in a single pipe:

```r
blogs_rec <- recipe(fb ~ ., data = blogs_tr) %>%
  step_zv(all_numeric_predictors()) %>%
  step_normalize(all_numeric_predictors()) %>%
  prep(blogs_tr)

blogs_tr2 <- blogs_rec %>% bake(blogs_tr)
blogs_te2 <- blogs_rec %>% bake(blogs_te)

```
## mean of comments in orig training: 39.8, sd: 112
## mean of comments in baked training: 2e-17, sd: 1
```

```
## mean of comments in orig testing: 38.9, sd: 110
## mean of comments in baked testing: -0.008, sd: 0.977
```

---

Can also `tidy()` a recipe:

```r
tidy(blogs_rec)
```

```
## # A tibble: 2 × 6
##   number operation type      trained skip  id             
##    <int> <chr>     <chr>     <lgl>   <lgl> <chr>          
## 1      1 step      zv        TRUE    FALSE zv_BRxUF       
## 2      2 step      normalize TRUE    FALSE normalize_aPFe9
```

---

#### Fast Forward 10 weeks from now...

```r
rec_int_topints <- recipe(pets ~ ., data = ok_tr_interaction2) %>%
  step_textfeature(essays, prefix = "t",
                   extract_functions = my_text_funs) %>%
  update_role(essays, new_role = "discarded") %>%
  step_mutate_at(starts_with("t_"), fn = ~ifelse(is.na(.x), 0, .x)) %>%
  step_log(income, starts_with("len_"), starts_with("t_"),
           -t_essays_sent_bing, offset = 1) %>%
  step_impute_mean(income) %>%
  step_other(all_nominal_predictors(), -has_role("discarded"), other = "all_else") %>%
  step_novel(all_nominal_predictors(), -has_role("discarded")) %>%
  step_impute_mode(all_nominal_predictors(), -has_role("discarded")) %>%
  step_dummy(all_nominal_predictors(), -has_role("discarded"), one_hot = FALSE) %>%
  step_interact(topint_ints) %>%
  step_nzv(all_numeric_predictors(), freq_cut = 99/1) %>%
  step_upsample(pets, over_ratio = 1, seed = 42)
```

---

### Modeling

For now let us use the original `blogs_tr` data.

Functions `linear_reg()` and `set_engine()` are from the `parsnip` package:

```r
mod_ridge_spec <- linear_reg(mixture = 0, penalty = 0.001) %>%
  set_engine(engine = "glmnet")

mod_ridge_spec
```

```
## Linear Regression Model Specification (regression)
## 
## Main Arguments:
##   penalty = 0.001
##   mixture = 0
## 
## Computational engine: glmnet
```

---

```r
mod_ridge <- mod_ridge_spec %>%
  fit(fb ~ ., data = blogs_tr)

mod_ridge
```

```
## parsnip model object
## 
## 
## Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian",      alpha = ~0) 
## 
##      Df  %Dev Lambda
## 1   276  0.00 614.20
## 2   276  2.62 559.60
## 3   276  2.86 509.90
## 4   276  3.11 464.60
## 5   276  3.38 423.30
## 6   276  3.68 385.70
## 7   276  4.00 351.40
## 8   276  4.34 320.20
## 9   276  4.70 291.80
## 10  276  5.09 265.90
## 11  276  5.51 242.20
## 12  276  5.95 220.70
## 13  276  6.42 201.10
## 14  276  6.92 183.20
## 15  276  7.44 167.00
## 16  276  7.99 152.10
## 17  276  8.58 138.60
## 18  276  9.19 126.30
## 19  276  9.82 115.10
## 20  276 10.48 104.90
## 21  276 11.16  95.54
## 22  276 11.87  87.05
## 23  276 12.59  79.32
## 24  276 13.33  72.27
## 25  276 14.09  65.85
## 26  276 14.86  60.00
## 27  276 15.63  54.67
## 28  276 16.41  49.82
## 29  276 17.20  45.39
## 30  276 17.98  41.36
## 31  276 18.76  37.68
## 32  276 19.54  34.34
## 33  276 20.31  31.29
## 34  276 21.07  28.51
## 35  276 21.82  25.97
## 36  276 22.56  23.67
## 37  276 23.28  21.56
## 38  276 24.00  19.65
## 39  276 24.69  17.90
## 40  276 25.38  16.31
## 41  276 26.05  14.86
## 42  276 26.71  13.54
## 43  276 27.36  12.34
## 44  276 28.00  11.24
## 45  276 28.63  10.24
## 46  276 29.24   9.34
## 47  276 29.85   8.51
## 48  276 30.45   7.75
## 49  276 31.04   7.06
## 50  276 31.63   6.43
## 51  276 32.20   5.86
## 52  276 32.77   5.34
## 53  276 33.33   4.87
## 54  276 33.88   4.43
## 55  276 34.42   4.04
## 56  276 34.95   3.68
## 57  276 35.47   3.36
## 58  276 35.98   3.06
## 59  276 36.48   2.79
## 60  276 36.96   2.54
## 61  276 37.44   2.31
## 62  276 37.89   2.11
## 63  276 38.34   1.92
## 64  276 38.76   1.75
## 65  276 39.17   1.59
## 66  276 39.56   1.45
## 67  276 39.94   1.32
## 68  276 40.29   1.21
## 69  276 40.64   1.10
## 70  276 40.96   1.00
## 71  276 41.26   0.91
## 72  276 41.55   0.83
## 73  276 41.82   0.76
## 74  276 42.08   0.69
## 75  276 42.32   0.63
## 76  276 42.54   0.57
## 77  276 42.76   0.52
## 78  276 42.96   0.48
## 79  276 43.14   0.43
## 80  276 43.32   0.39
## 81  276 43.49   0.36
## 82  276 43.65   0.33
## 83  276 43.80   0.30
## 84  276 43.94   0.27
## 85  276 44.08   0.25
## 86  276 44.22   0.23
## 87  276 44.34   0.21
## 88  276 44.47   0.19
## 89  276 44.59   0.17
## 90  276 44.71   0.16
## 91  276 44.82   0.14
## 92  276 44.93   0.13
## 93  276 45.04   0.12
## 94  276 45.15   0.11
## 95  276 45.26   0.10
## 96  276 45.36   0.09
## 97  276 45.46   0.08
## 98  276 45.56   0.07
## 99  276 45.65   0.07
## 100 276 45.75   0.06
```

---

In a single pipe:

```r
mod_lasso <- linear_reg(mixture = 1, penalty = 0.001) %>%
  set_engine(engine = "glmnet") %>%
  fit(fb ~ ., data = blogs_tr)

mod_lasso
```

```
## parsnip model object
## 
## 
## Call:  glmnet::glmnet(x = maybe_matrix(x), y = y, family = "gaussian",      alpha = ~1) 
## 
##     Df  %Dev  Lambda
## 1    0  0.00 0.61420
## 2    1  4.97 0.55960
## 3    1  9.09 0.50990
## 4    3 13.12 0.46460
## 5    3 16.98 0.42330
## 6    3 20.19 0.38570
## 7    3 22.85 0.35140
## 8    3 25.06 0.32020
## 9    3 26.89 0.29180
## 10   4 29.23 0.26590
## 11   4 31.29 0.24220
## 12   4 33.00 0.22070
## 13   4 34.42 0.20110
## 14   4 35.60 0.18320
## 15   4 36.58 0.16700
## 16   4 37.40 0.15210
## 17   4 38.07 0.13860
## 18   4 38.63 0.12630
## 19   5 39.10 0.11510
## 20   5 39.49 0.10490
## 21   5 39.81 0.09554
## 22   6 40.11 0.08705
## 23   6 40.42 0.07932
## 24   7 40.71 0.07227
## 25   8 40.97 0.06585
## 26   8 41.21 0.06000
## 27   8 41.41 0.05467
## 28  11 41.58 0.04982
## 29  12 41.76 0.04539
## 30  11 41.91 0.04136
## 31  13 42.06 0.03768
## 32  16 42.19 0.03434
## 33  18 42.31 0.03129
## 34  20 42.45 0.02851
## 35  20 42.57 0.02597
## 36  22 42.68 0.02367
## 37  25 42.79 0.02156
## 38  29 42.88 0.01965
## 39  32 42.98 0.01790
## 40  39 43.12 0.01631
## 41  42 43.31 0.01486
## 42  47 43.51 0.01354
## 43  55 43.70 0.01234
## 44  65 43.92 0.01124
## 45  69 44.10 0.01024
## 46  76 44.34 0.00934
## 47  85 44.55 0.00850
## 48  89 44.73 0.00775
## 49  97 44.90 0.00706
## 50 103 45.09 0.00643
## 51 111 45.25 0.00586
## 52 116 45.45 0.00534
## 53 125 45.59 0.00487
## 54 133 45.74 0.00443
## 55 144 45.95 0.00404
## 56 149 46.14 0.00368
## 57 160 46.32 0.00335
## 58 158 46.47 0.00306
## 59 164 46.60 0.00279
## 60 171 46.70 0.00254
## 61 172 46.79 0.00231
## 62 174 46.86 0.00211
## 63 183 46.94 0.00192
## 64 185 47.03 0.00175
## 65 190 47.11 0.00159
## 66 196 47.18 0.00145
## 67 204 47.24 0.00132
## 68 202 47.30 0.00121
## 69 200 47.35 0.00110
## 70 206 47.39 0.00100
## 71 215 47.43 0.00091
## 72 220 47.46 0.00083
## 73 222 47.49 0.00076
## 74 224 47.52 0.00069
## 75 229 47.55 0.00063
## 76 239 47.58 0.00057
## 77 244 47.61 0.00052
## 78 242 47.65 0.00048
## 79 241 47.69 0.00043
## 80 240 47.73 0.00039
## 81 241 47.77 0.00036
## 82 243 47.80 0.00033
## 83 243 47.83 0.00030
## 84 243 47.86 0.00027
## 85 246 47.88 0.00025
## 86 251 47.89 0.00023
## 87 252 47.91 0.00021
## 88 252 47.92 0.00019
## 89 253 47.93 0.00017
## 90 256 47.94 0.00016
## 91 258 47.95 0.00014
## 92 258 47.96 0.00013
## 93 260 47.97 0.00012
## 94 259 47.98 0.00011
## 95 261 47.98 0.00010
## 96 265 47.99 0.00009
## 97 260 47.99 0.00008
## 98 260 47.99 0.00007
## 99 264 47.99 0.00007
```

---

Can also use `fit_xy()` a-la `sklearn`:

```r
mod_rf <- rand_forest(mode = "regression", mtry = 50, trees = 50, min_n = 10) %>%
  set_engine("ranger") %>%
  fit_xy(x = blogs_tr[, -281],
         y = blogs_tr$fb)

mod_rf
```

```
## parsnip model object
## 
## Ranger result
## 
## Call:
##  ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~50,      x), num.trees = ~50, min.node.size = min_rows(~10, x), num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1)) 
## 
## Type:                             Regression 
## Number of trees:                  50 
## Sample size:                      31438 
## Number of independent variables:  280 
## Mtry:                             50 
## Target node size:                 10 
## Variable importance mode:         none 
## Splitrule:                        variance 
## OOB prediction error (MSE):       0.4272363 
## R squared (OOB):                  0.6687403
```

---

Notice how easy it is to get the model's results in a tidy way using the `tidy()` function:

```r
tidy(mod_ridge)
```

```
## # A tibble: 281 × 3
##    term          estimate penalty
##    <chr>            <dbl>   <dbl>
##  1 (Intercept)  0.686       0.001
##  2 X1           0.00118     0.001
##  3 X2           0.00135     0.001
##  4 X3           0.000687    0.001
##  5 X4           0.000333    0.001
##  6 X5           0.0000880   0.001
##  7 X6           0.00209     0.001
##  8 X7          -0.00115     0.001
##  9 X8           0.277       0.001
## 10 X9          -0.000139    0.001
## # … with 271 more rows
```

---

### Predicting

```r
results_test <- mod_ridge %>%
  predict(new_data = blogs_te, penalty = 0.001) %>%
  mutate(
    truth = blogs_te$fb,
    method = "Ridge"
  ) %>%
  bind_rows(mod_lasso %>%
    predict(new_data = blogs_te) %>%
    mutate(
      truth = blogs_te$fb,
      method = "Lasso"
  )) %>%
  bind_rows(mod_rf %>%
    predict(new_data = blogs_te) %>%
    mutate(
      truth = blogs_te$fb,
      method = "RF"
  ))

dim(results_test)
```

```
## [1] 62877     3
```

```r
head(results_test)
```

```
## # A tibble: 6 × 3
##   .pred truth method
##   <dbl> <dbl> <chr> 
## 1 0.186 0     Ridge 
## 2 0.473 2.30  Ridge 
## 3 0.663 1.39  Ridge 
## 4 1.34  2.64  Ridge 
## 5 0.322 0.693 Ridge 
## 6 0.698 0.693 Ridge
```

---

### Comparing Models

The package `yardstick` has tons of performance [metrics](https://tidymodels.github.io/yardstick/articles/metric-types.html):

```r
results_test %>%
  group_by(method) %>%
  rmse(truth = truth, estimate = .pred)
```

```
## # A tibble: 3 × 4
##   method .metric .estimator .estimate
##   <chr>  <chr>   <chr>          <dbl>
## 1 Lasso  rmse    standard       0.836
## 2 RF     rmse    standard       0.647
## 3 Ridge  rmse    standard       0.852
```

---

```r
results_test %>%
  ggplot(aes(truth, .pred)) +
  geom_point(color = "red", alpha = 0.5) +
  geom_abline(slope = 1, intercept = 0) +
  facet_wrap(~ factor(method, levels = c("Ridge", "Lasso", "RF"))) +
  theme_bw()
```

---

### Tuning

Define your model spec, using `tune()` from the `tune` package for a parameter you wish to tune:

```r
mod_rf_spec <- rand_forest(mode = "regression",
                           mtry = tune(),
                           min_n = tune(),
                           trees = 100) %>%
  set_engine("ranger")
```

---

Define the `grid` on which you train your params, with the `dials` package:

```r
rf_grid <- grid_regular(mtry(range(10, 70)), min_n(range(10, 30)),
                        levels = c(4, 3))

rf_grid
```

```
## # A tibble: 12 × 2
##     mtry min_n
##    <int> <int>
##  1    10    10
##  2    30    10
##  3    50    10
##  4    70    10
##  5    10    20
##  6    30    20
##  7    50    20
##  8    70    20
##  9    10    30
## 10    30    30
## 11    50    30
## 12    70    30
```

---

Split your data into a few folds for Cross Validation with `vfold_cv()` from the `rsample` package:

```r
cv_splits <- vfold_cv(blogs_tr, v = 5)

cv_splits
```

```
## #  5-fold cross-validation 
## # A tibble: 5 × 2
##   splits               id   
##   <list>               <chr>
## 1 <split [25150/6288]> Fold1
## 2 <split [25150/6288]> Fold2
## 3 <split [25150/6288]> Fold3
## 4 <split [25151/6287]> Fold4
## 5 <split [25151/6287]> Fold5
```

---

Now perform cross validation with `tune_grid()` from the `tune` package:

```r
tune_res <- tune_grid(mod_rf_spec,
                      recipe(fb ~ ., data = blogs_tr),
                      resamples = cv_splits,
                      grid = rf_grid,
                      metrics = metric_set(rmse))
tune_res
```

```
## # Tuning results
## # 5-fold cross-validation 
## # A tibble: 5 × 4
##   splits               id    .metrics          .notes          
##   <list>               <chr> <list>            <list>          
## 1 <split [25150/6288]> Fold1 <tibble [12 × 6]> <tibble [0 × 1]>
## 2 <split [25150/6288]> Fold2 <tibble [12 × 6]> <tibble [0 × 1]>
## 3 <split [25150/6288]> Fold3 <tibble [12 × 6]> <tibble [0 × 1]>
## 4 <split [25151/6287]> Fold4 <tibble [12 × 6]> <tibble [0 × 1]>
## 5 <split [25151/6287]> Fold5 <tibble [12 × 6]> <tibble [0 × 1]>
```

---

```r
tune_res$.metrics[[1]]
```

```
## # A tibble: 12 × 6
##     mtry min_n .metric .estimator .estimate .config              
##    <int> <int> <chr>   <chr>          <dbl> <chr>                
##  1    10    10 rmse    standard       0.687 Preprocessor1_Model01
##  2    30    10 rmse    standard       0.649 Preprocessor1_Model02
##  3    50    10 rmse    standard       0.644 Preprocessor1_Model03
##  4    70    10 rmse    standard       0.642 Preprocessor1_Model04
##  5    10    20 rmse    standard       0.697 Preprocessor1_Model05
##  6    30    20 rmse    standard       0.655 Preprocessor1_Model06
##  7    50    20 rmse    standard       0.646 Preprocessor1_Model07
##  8    70    20 rmse    standard       0.643 Preprocessor1_Model08
##  9    10    30 rmse    standard       0.701 Preprocessor1_Model09
## 10    30    30 rmse    standard       0.657 Preprocessor1_Model10
## 11    50    30 rmse    standard       0.651 Preprocessor1_Model11
## 12    70    30 rmse    standard       0.643 Preprocessor1_Model12
```

---

Collect the mean metric across folds:

```r
estimates <- collect_metrics(tune_res)

estimates
```

```
## # A tibble: 12 × 8
##     mtry min_n .metric .estimator  mean     n std_err .config              
##    <int> <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                
##  1    10    10 rmse    standard   0.698     5 0.00647 Preprocessor1_Model01
##  2    30    10 rmse    standard   0.656     5 0.00673 Preprocessor1_Model02
##  3    50    10 rmse    standard   0.648     5 0.00569 Preprocessor1_Model03
##  4    70    10 rmse    standard   0.646     5 0.00572 Preprocessor1_Model04
##  5    10    20 rmse    standard   0.704     5 0.00668 Preprocessor1_Model05
##  6    30    20 rmse    standard   0.662     5 0.00638 Preprocessor1_Model06
##  7    50    20 rmse    standard   0.650     5 0.00527 Preprocessor1_Model07
##  8    70    20 rmse    standard   0.647     5 0.00576 Preprocessor1_Model08
##  9    10    30 rmse    standard   0.710     5 0.00740 Preprocessor1_Model09
## 10    30    30 rmse    standard   0.664     5 0.00590 Preprocessor1_Model10
## 11    50    30 rmse    standard   0.654     5 0.00526 Preprocessor1_Model11
## 12    70    30 rmse    standard   0.649     5 0.00567 Preprocessor1_Model12
```

---

Choose best parameter:

```r
estimates %>%
  mutate(min_n = factor(min_n)) %>%
  ggplot(aes(x = mtry, y = mean, color = min_n)) + 
  geom_point() + 
  geom_line() + 
  labs(y = "Mean RMSE") +
  theme_bw()
```

---

There are of course also methods for helping us choose best params and final model.

```r
best_rmse <- tune_res %>% select_best(metric = "rmse")
best_rmse
```

```
## # A tibble: 1 × 3
##    mtry min_n .config              
##   <int> <int> <chr>                
## 1    70    10 Preprocessor1_Model04
```