Pattern Recognition And Machine Learning
查字典图书网
当前位置: 查字典 > 图书网 > 算法> Pattern Recognition And Machine Learning

Pattern Recognition And Machine Learning

Pattern Recognition And Machine Learning

9.6

作者: Christopher Bishop
出版社: Springer
出版年: 2007-10-1
页数: 738
定价: USD 94.95
装帧: Hardcover
ISBN: 9780387310732

我要收藏

内容简介:

The dramatic growth in practical applications for machine learning over the last ten years has been accompanied by many important developments in the underlying algorithms and techniques. For example, Bayesian methods have grown from a specialist niche to become mainstream, while graphical models have emerged as a general framework for describing and applying probabilistic techniques. The practical applicability of Bayesian methods has been greatly enhanced by the development of a range of approximate inference algorithms such as variational Bayes and expectation propagation, while new models based on kernels have had a significant impact on both algorithms and applications.

This completely new textbook reflects these recent developments while providing a comprehensive introduction to the fields of pattern recognition and machine learning. It is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of pattern recognition or machine learning concepts is assumed. Familiarity with multivariate calculus and basic linear algebra is required, and some experience in the use of probabilities would be helpful though not essential as the book includes a self-contained introduction to basic probability theory.

The book is suitable for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. Extensive support is provided for course instructors, including more than 400 exercises, graded according to difficulty. Example solutions for a subset of the exercises are available from the book web site, while solutions for the remainder can be obtained by instructors from the publisher. The book is supported by a great deal of additional material, and the reader is encouraged to visit the book web site for the latest information.

目录:

1 Introduction 1

1.1 Example: Polynomial Curve Fitting . . . . . . . . . . . . . . . . . 4

1.2 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 Probability densities . . . . . . . . . . . . . . . . . . . . . 17

1.2.2 Expectations and covariances . . . . . . . . . . . . . . . . 19

1.2.3 Bayesian probabilities . . . . . . . . . . . . . . . . . . . . 21

1.2.4 The Gaussian distribution . . . . . . . . . . . . . . . . . . 24

1.2.5 Curve fitting re-visited . . . . . . . . . . . . . . . . . . . . 28

1.2.6 Bayesian curve fitting . . . . . . . . . . . . . . . . . . . . 30

1.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.4 The Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . 33

1.5 Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

1.5.1 Minimizing the misclassification rate . . . . . . . . . . . . 39

1.5.2 Minimizing the expected loss . . . . . . . . . . . . . . . . 41

1.5.3 The reject option . . . . . . . . . . . . . . . . . . . . . . . 42

1.5.4 Inference and decision . . . . . . . . . . . . . . . . . . . . 42

1.5.5 Loss functions for regression . . . . . . . . . . . . . . . . . 46

1.6 Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1.6.1 Relative entropy and mutual information . . . . . . . . . . 55

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2 Probability Distributions 67

2.1 Binary Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

2.1.1 The beta distribution . . . . . . . . . . . . . . . . . . . . . 71

2.2 Multinomial Variables . . . . . . . . . . . . . . . . . . . . . . . . 74

2.2.1 The Dirichlet distribution . . . . . . . . . . . . . . . . . . . 76

2.3 The Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . 78

2.3.1 Conditional Gaussian distributions . . . . . . . . . . . . . . 85

2.3.2 Marginal Gaussian distributions . . . . . . . . . . . . . . . 88

2.3.3 Bayes’ theorem for Gaussian variables . . . . . . . . . . . . 90

2.3.4 Maximum likelihood for the Gaussian . . . . . . . . . . . . 93

2.3.5 Sequential estimation . . . . . . . . . . . . . . . . . . . . . 94

2.3.6 Bayesian inference for the Gaussian . . . . . . . . . . . . . 97

2.3.7 Student’s t-distribution . . . . . . . . . . . . . . . . . . . . 102

2.3.8 Periodic variables . . . . . . . . . . . . . . . . . . . . . . . 105

2.3.9 Mixtures of Gaussians . . . . . . . . . . . . . . . . . . . . 110

2.4 The Exponential Family . . . . . . . . . . . . . . . . . . . . . . . 113

2.4.1 Maximum likelihood and sufficient statistics . . . . . . . . 116

2.4.2 Conjugate priors . . . . . . . . . . . . . . . . . . . . . . . 117

2.4.3 Noninformative priors . . . . . . . . . . . . . . . . . . . . 117

2.5 Nonparametric Methods . . . . . . . . . . . . . . . . . . . . . . . 120

2.5.1 Kernel density estimators . . . . . . . . . . . . . . . . . . . 122

2.5.2 Nearest-neighbour methods . . . . . . . . . . . . . . . . . 124

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

3 Linear Models for Regression 137

3.1 Linear Basis Function Models . . . . . . . . . . . . . . . . . . . . 138

3.1.1 Maximum likelihood and least squares . . . . . . . . . . . . 140

3.1.2 Geometry of least squares . . . . . . . . . . . . . . . . . . 143

3.1.3 Sequential learning . . . . . . . . . . . . . . . . . . . . . . 143

3.1.4 Regularized least squares . . . . . . . . . . . . . . . . . . . 144

3.1.5 Multiple outputs . . . . . . . . . . . . . . . . . . . . . . . 146

3.2 The Bias-Variance Decomposition . . . . . . . . . . . . . . . . . . 147

3.3 Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . 152

3.3.1 Parameter distribution . . . . . . . . . . . . . . . . . . . . 153

3.3.2 Predictive distribution . . . . . . . . . . . . . . . . . . . . 156

3.3.3 Equivalent kernel . . . . . . . . . . . . . . . . . . . . . . . 157

3.4 Bayesian Model Comparison . . . . . . . . . . . . . . . . . . . . . 161

3.5 The Evidence Approximation . . . . . . . . . . . . . . . . . . . . 165

3.5.1 Evaluation of the evidence function . . . . . . . . . . . . . 166

3.5.2 Maximizing the evidence function . . . . . . . . . . . . . . 168

3.5.3 Effective number of parameters . . . . . . . . . . . . . . . 170

3.6 Limitations of Fixed Basis Functions . . . . . . . . . . . . . . . . 172

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

4 Linear Models for Classification 179

4.1 Discriminant Functions . . . . . . . . . . . . . . . . . . . . . . . . 181

4.1.1 Two classes . . . . . . . . . . . . . . . . . . . . . . . . . . 181

4.1.2 Multiple classes . . . . . . . . . . . . . . . . . . . . . . . . 182

4.1.3 Least squares for classification . . . . . . . . . . . . . . . . 184

4.1.4 Fisher’s linear discriminant . . . . . . . . . . . . . . . . . . 186

4.1.5 Relation to least squares . . . . . . . . . . . . . . . . . . . 189

4.1.6 Fisher’s discriminant for multiple classes . . . . . . . . . . 191

4.1.7 The perceptron algorithm . . . . . . . . . . . . . . . . . . . 192

4.2 Probabilistic Generative Models . . . . . . . . . . . . . . . . . . . 196

4.2.1 Continuous inputs . . . . . . . . . . . . . . . . . . . . . . 198

4.2.2 Maximum likelihood solution . . . . . . . . . . . . . . . . 200

4.2.3 Discrete features . . . . . . . . . . . . . . . . . . . . . . . 202

4.2.4 Exponential family . . . . . . . . . . . . . . . . . . . . . . 202

4.3 Probabilistic Discriminative Models . . . . . . . . . . . . . . . . . 203

4.3.1 Fixed basis functions . . . . . . . . . . . . . . . . . . . . . 204

4.3.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . 205

4.3.3 Iterative reweighted least squares . . . . . . . . . . . . . . 207

4.3.4 Multiclass logistic regression . . . . . . . . . . . . . . . . . 209

4.3.5 Probit regression . . . . . . . . . . . . . . . . . . . . . . . 210

4.3.6 Canonical link functions . . . . . . . . . . . . . . . . . . . 212

4.4 The Laplace Approximation . . . . . . . . . . . . . . . . . . . . . 213

4.4.1 Model comparison and BIC . . . . . . . . . . . . . . . . . 216

4.5 Bayesian Logistic Regression . . . . . . . . . . . . . . . . . . . . 217

4.5.1 Laplace approximation . . . . . . . . . . . . . . . . . . . . 217

4.5.2 Predictive distribution . . . . . . . . . . . . . . . . . . . . 218

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

5 Neural Networks 225

5.1 Feed-forward Network Functions . . . . . . . . . . . . . . . . . . 227

5.1.1 Weight-space symmetries . . . . . . . . . . . . . . . . . . 231

5.2 Network Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

5.2.1 Parameter optimization . . . . . . . . . . . . . . . . . . . . 236

5.2.2 Local quadratic approximation . . . . . . . . . . . . . . . . 237

5.2.3 Use of gradient information . . . . . . . . . . . . . . . . . 239

5.2.4 Gradient descent optimization . . . . . . . . . . . . . . . . 240

5.3 Error Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . 241

5.3.1 Evaluation of error-function derivatives . . . . . . . . . . . 242

5.3.2 A simple example . . . . . . . . . . . . . . . . . . . . . . 245

5.3.3 Efficiency of backpropagation . . . . . . . . . . . . . . . . 246

5.3.4 The Jacobian matrix . . . . . . . . . . . . . . . . . . . . . 247

5.4 The Hessian Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 249

5.4.1 Diagonal approximation . . . . . . . . . . . . . . . . . . . 250

5.4.2 Outer product approximation . . . . . . . . . . . . . . . . . 251

5.4.3 Inverse Hessian . . . . . . . . . . . . . . . . . . . . . . . . 252

5.4.4 Finite differences . . . . . . . . . . . . . . . . . . . . . . . 252

5.4.5 Exact evaluation of the Hessian . . . . . . . . . . . . . . . 253

5.4.6 Fast multiplication by the Hessian . . . . . . . . . . . . . . 254

5.5 Regularization in Neural Networks . . . . . . . . . . . . . . . . . 256

5.5.1 Consistent Gaussian priors . . . . . . . . . . . . . . . . . . 257

5.5.2 Early stopping . . . . . . . . . . . . . . . . . . . . . . . . 259

5.5.3 Invariances . . . . . . . . . . . . . . . . . . . . . . . . . . 261

5.5.4 Tangent propagation . . . . . . . . . . . . . . . . . . . . . 263

5.5.5 Training with transformed data . . . . . . . . . . . . . . . . 265

5.5.6 Convolutional networks . . . . . . . . . . . . . . . . . . . 267

5.5.7 Soft weight sharing . . . . . . . . . . . . . . . . . . . . . . 269

5.6 Mixture Density Networks . . . . . . . . . . . . . . . . . . . . . . 272

5.7 Bayesian Neural Networks . . . . . . . . . . . . . . . . . . . . . . 277

5.7.1 Posterior parameter distribution . . . . . . . . . . . . . . . 278

5.7.2 Hyperparameter optimization . . . . . . . . . . . . . . . . 280

5.7.3 Bayesian neural networks for classification . . . . . . . . . 281

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

6 Kernel Methods 291

6.1 Dual Representations . . . . . . . . . . . . . . . . . . . . . . . . . 293

6.2 Constructing Kernels . . . . . . . . . . . . . . . . . . . . . . . . . 294

6.3 Radial Basis Function Networks . . . . . . . . . . . . . . . . . . . 299

6.3.1 Nadaraya-Watson model . . . . . . . . . . . . . . . . . . . 301

6.4 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 303

6.4.1 Linear regression revisited . . . . . . . . . . . . . . . . . . 304

6.4.2 Gaussian processes for regression . . . . . . . . . . . . . . 306

6.4.3 Learning the hyperparameters . . . . . . . . . . . . . . . . 311

6.4.4 Automatic relevance determination . . . . . . . . . . . . . 312

6.4.5 Gaussian processes for classification . . . . . . . . . . . . . 313

6.4.6 Laplace approximation . . . . . . . . . . . . . . . . . . . . 315

6.4.7 Connection to neural networks . . . . . . . . . . . . . . . . 319

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320

7 Sparse Kernel Machines 325

7.1 Maximum Margin Classifiers . . . . . . . . . . . . . . . . . . . . 326

7.1.1 Overlapping class distributions . . . . . . . . . . . . . . . . 331

7.1.2 Relation to logistic regression . . . . . . . . . . . . . . . . 336

7.1.3 Multiclass SVMs . . . . . . . . . . . . . . . . . . . . . . . 338

7.1.4 SVMs for regression . . . . . . . . . . . . . . . . . . . . . 339

7.1.5 Computational learning theory . . . . . . . . . . . . . . . . 344

7.2 Relevance Vector Machines . . . . . . . . . . . . . . . . . . . . . 345

7.2.1 RVM for regression . . . . . . . . . . . . . . . . . . . . . . 345

7.2.2 Analysis of sparsity . . . . . . . . . . . . . . . . . . . . . . 349

7.2.3 RVM for classification . . . . . . . . . . . . . . . . . . . . 353

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

8 Graphical Models 359

8.1 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 360

8.1.1 Example: Polynomial regression . . . . . . . . . . . . . . . 362

8.1.2 Generative models . . . . . . . . . . . . . . . . . . . . . . 365

8.1.3 Discrete variables . . . . . . . . . . . . . . . . . . . . . . . 366

8.1.4 Linear-Gaussian models . . . . . . . . . . . . . . . . . . . 370

8.2 Conditional Independence . . . . . . . . . . . . . . . . . . . . . . 372

8.2.1 Three example graphs . . . . . . . . . . . . . . . . . . . . 373

8.2.2 D-separation . . . . . . . . . . . . . . . . . . . . . . . . . 378

8.3 Markov Random Fields . . . . . . . . . . . . . . . . . . . . . . . 383

8.3.1 Conditional independence properties . . . . . . . . . . . . . 383

8.3.2 Factorization properties . . . . . . . . . . . . . . . . . . . 384

8.3.3 Illustration: Image de-noising . . . . . . . . . . . . . . . . 387

8.3.4 Relation to directed graphs . . . . . . . . . . . . . . . . . . 390

8.4 Inference in Graphical Models . . . . . . . . . . . . . . . . . . . . 393

8.4.1 Inference on a chain . . . . . . . . . . . . . . . . . . . . . 394

8.4.2 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

8.4.3 Factor graphs . . . . . . . . . . . . . . . . . . . . . . . . . 399

8.4.4 The sum-product algorithm . . . . . . . . . . . . . . . . . . 402

8.4.5 The max-sum algorithm . . . . . . . . . . . . . . . . . . . 411

8.4.6 Exact inference in general graphs . . . . . . . . . . . . . . 416

8.4.7 Loopy belief propagation . . . . . . . . . . . . . . . . . . . 417

8.4.8 Learning the graph structure . . . . . . . . . . . . . . . . . 418

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418

9 Mixture Models and EM 423

9.1 K-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 424

9.1.1 Image segmentation and compression . . . . . . . . . . . . 428

9.2 Mixtures of Gaussians . . . . . . . . . . . . . . . . . . . . . . . . 430

9.2.1 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . 432

9.2.2 EM for Gaussian mixtures . . . . . . . . . . . . . . . . . . 435

9.3 An Alternative View of EM . . . . . . . . . . . . . . . . . . . . . 439

9.3.1 Gaussian mixtures revisited . . . . . . . . . . . . . . . . . 441

9.3.2 Relation to K-means . . . . . . . . . . . . . . . . . . . . . 443

9.3.3 Mixtures of Bernoulli distributions . . . . . . . . . . . . . . 444

9.3.4 EM for Bayesian linear regression . . . . . . . . . . . . . . 448

9.4 The EM Algorithm in General . . . . . . . . . . . . . . . . . . . . 450

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455

10 Approximate Inference 461

10.1 Variational Inference . . . . . . . . . . . . . . . . . . . . . . . . . 462

10.1.1 Factorized distributions . . . . . . . . . . . . . . . . . . . . 464

10.1.2 Properties of factorized approximations . . . . . . . . . . . 466

10.1.3 Example: The univariate Gaussian . . . . . . . . . . . . . . 470

10.1.4 Model comparison . . . . . . . . . . . . . . . . . . . . . . 473

10.2 Illustration: Variational Mixture of Gaussians . . . . . . . . . . . . 474

10.2.1 Variational distribution . . . . . . . . . . . . . . . . . . . . 475

10.2.2 Variational lower bound . . . . . . . . . . . . . . . . . . . 481

10.2.3 Predictive density . . . . . . . . . . . . . . . . . . . . . . . 482

10.2.4 Determining the number of components . . . . . . . . . . . 483

10.2.5 Induced factorizations . . . . . . . . . . . . . . . . . . . . 485

10.3 Variational Linear Regression . . . . . . . . . . . . . . . . . . . . 486

10.3.1 Variational distribution . . . . . . . . . . . . . . . . . . . . 486

10.3.2 Predictive distribution . . . . . . . . . . . . . . . . . . . . 488

10.3.3 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . . 489

10.4 Exponential Family Distributions . . . . . . . . . . . . . . . . . . 490

10.4.1 Variational message passing . . . . . . . . . . . . . . . . . 491

10.5 Local Variational Methods . . . . . . . . . . . . . . . . . . . . . . 493

10.6 Variational Logistic Regression . . . . . . . . . . . . . . . . . . . 498

10.6.1 Variational posterior distribution . . . . . . . . . . . . . . . 498

10.6.2 Optimizing the variational parameters . . . . . . . . . . . . 500

10.6.3 Inference of hyperparameters . . . . . . . . . . . . . . . . 502

10.7 Expectation Propagation . . . . . . . . . . . . . . . . . . . . . . . 505

10.7.1 Example: The clutter problem . . . . . . . . . . . . . . . . 511

10.7.2 Expectation propagation on graphs . . . . . . . . . . . . . . 513

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

11 Sampling Methods 523

11.1 Basic Sampling Algorithms . . . . . . . . . . . . . . . . . . . . . 526

11.1.1 Standard distributions . . . . . . . . . . . . . . . . . . . . 526

11.1.2 Rejection sampling . . . . . . . . . . . . . . . . . . . . . . 528

11.1.3 Adaptive rejection sampling . . . . . . . . . . . . . . . . . 530

11.1.4 Importance sampling . . . . . . . . . . . . . . . . . . . . . 532

11.1.5 Sampling-importance-resampling . . . . . . . . . . . . . . 534

11.1.6 Sampling and the EM algorithm . . . . . . . . . . . . . . . 536

11.2 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . 537

11.2.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . 539

11.2.2 The Metropolis-Hastings algorithm . . . . . . . . . . . . . 541

11.3 Gibbs Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 542

11.4 Slice Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546

11.5 The Hybrid Monte Carlo Algorithm . . . . . . . . . . . . . . . . . 548

11.5.1 Dynamical systems . . . . . . . . . . . . . . . . . . . . . . 548

11.5.2 Hybrid Monte Carlo . . . . . . . . . . . . . . . . . . . . . 552

11.6 Estimating the Partition Function . . . . . . . . . . . . . . . . . . 554

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556

12 Continuous Latent Variables 559

12.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . 561

12.1.1 Maximum variance formulation . . . . . . . . . . . . . . . 561

12.1.2 Minimum-error formulation . . . . . . . . . . . . . . . . . 563

12.1.3 Applications of PCA . . . . . . . . . . . . . . . . . . . . . 565

12.1.4 PCA for high-dimensional data . . . . . . . . . . . . . . . 569

12.2 Probabilistic PCA . . . . . . . . . . . . . . . . . . . . . . . . . . 570

12.2.1 Maximum likelihood PCA . . . . . . . . . . . . . . . . . . 574

12.2.2 EM algorithm for PCA . . . . . . . . . . . . . . . . . . . . 577

12.2.3 Bayesian PCA . . . . . . . . . . . . . . . . . . . . . . . . 580

12.2.4 Factor analysis . . . . . . . . . . . . . . . . . . . . . . . . 583

12.3 Kernel PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586

12.4 Nonlinear Latent Variable Models . . . . . . . . . . . . . . . . . . 591

12.4.1 Independent component analysis . . . . . . . . . . . . . . . 591

12.4.2 Autoassociative neural networks . . . . . . . . . . . . . . . 592

12.4.3 Modelling nonlinear manifolds . . . . . . . . . . . . . . . . 595

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599

13 Sequential Data 605

13.1 Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

13.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . 610

13.2.1 Maximum likelihood for the HMM . . . . . . . . . . . . . 615

13.2.2 The forward-backward algorithm . . . . . . . . . . . . . . 618

13.2.3 The sum-product algorithm for the HMM . . . . . . . . . . 625

13.2.4 Scaling factors . . . . . . . . . . . . . . . . . . . . . . . . 627

13.2.5 The Viterbi algorithm . . . . . . . . . . . . . . . . . . . . . 629

13.2.6 Extensions of the hidden Markov model . . . . . . . . . . . 631

13.3 Linear Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . 635

13.3.1 Inference in LDS . . . . . . . . . . . . . . . . . . . . . . . 638

13.3.2 Learning in LDS . . . . . . . . . . . . . . . . . . . . . . . 642

13.3.3 Extensions of LDS . . . . . . . . . . . . . . . . . . . . . . 644

13.3.4 Particle filters . . . . . . . . . . . . . . . . . . . . . . . . . 645

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646

14 Combining Models 653

14.1 Bayesian Model Averaging . . . . . . . . . . . . . . . . . . . . . . 654

14.2 Committees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655

14.3 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657

14.3.1 Minimizing exponential error . . . . . . . . . . . . . . . . 659

14.3.2 Error functions for boosting . . . . . . . . . . . . . . . . . 661

14.4 Tree-based Models . . . . . . . . . . . . . . . . . . . . . . . . . . 663

14.5 Conditional Mixture Models . . . . . . . . . . . . . . . . . . . . . 666

14.5.1 Mixtures of linear regression models . . . . . . . . . . . . . 667

14.5.2 Mixtures of logistic models . . . . . . . . . . . . . . . . . 670

14.5.3 Mixtures of experts . . . . . . . . . . . . . . . . . . . . . . 672

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674

Appendix A Data Sets 677

Appendix B Probability Distributions 685

Appendix C Properties of Matrices 695

Appendix D Calculus of Variations 703

Appendix E LagrangeMultipliers 707

References 711

展开全文
随机来一本书

推荐文章

猜你喜欢

附近的人在看

推荐阅读

拓展阅读

热门标签:
我想说两句
我要写长评
 想读     在读     读过   
评价:
标签(多个标签以“,”分开):