If you’re modelling house prices using just size, getting a large sample size won’t help too much
Can improve low bias models
Is data size an issue? can artificially restrict training data size and then evaluate error
Training:
+ zero error for low m + increases error as m increases, as degrees of freedom/m falls
cv:
+ error decreases as data set increases, more accurate theta
The two curves converge towards each other for v large m
When are large datasets useful?
When all features available:
Predicting house price using just size, won’t benefit from more data...
If choosing correct word in sentence (to, too, two), more helpful
If human expert can do it, then more data probably helpful
Expert realtor probably couldn’t do much with just size, but speaker could answer other q
Could expert do it?
Low bias algorithms do well with more data
More data good if large number of parameters, or lot of hidden units.
Role of lambda: high makes impact of more variables lower => high bias
Low makes impacts of more variables strong => high variance
Can trade off using cut off. only make positive if above \(0.7\)
How to use? difficult, as lambda within cost!
Can do similarly to d:
Run for a range of lambda (eg 0, 0.01, 0.02, 0.04, 0.08: 10), then pick from cross validation set
Low lambda always has low cost for training set, but not for cv set..
Regularisation: add to error term the size of the term. penalised large parameters
May not fit outside sample
High bias: eg house prices and size. linear would have high bias for out of scope sample (underfitting)
High variance: making polynomial passing through all data (overfitting)
Can reduce overfitting by reducing features either manaually or using models
OR regularisation: keep all features, but reduce magnitude of theta
Make cost function include size of \(\theta^2\) values
\(\min \dfrac{1}{2m} [\sum (h(x)-y)^2 + 1000 \theta 3 ^2 + 1000 \theta 4 ^2]\)
or more broadly:
\(\min \dfrac{1}{2m}[\sum ..... + \lambda \sum \theta j^2]\)
Tend to not include theta 0 as convention, no regularisation
Update for linear regression is
\(\theta j = \theta j -\alpha{(\dfrac{1}{m})* sum(h(x)-y)xj + (\lambda/m \theta j)}\)
\(\theta j = \theta j (1- \alpha \lambda / m) -alpha {(1/m)*\sum(h(x)-y)xj}\)
This is the same as before, but theta \(j\) updates from a smaller \(\theta\) \(j\) each time.
Normal equation needs a change
\((X'X)^-1X'y=\theta''\)
Now is
\((X'X+\lambda I)^-1X y'\)
although for theta 0, lambda zero, so indentiy matrix, but first element 0
REGULARISATION FOR REGULARISATION
add to end of \(J(\theta)\):
\(\dfrac{\lambda }{2m} \sum \theta j^2\)
update for \(\theta\)\(j\) \(j>0\): is a as linear regression, but \(h(x)\) is a different function