Deep Learning

内容简介：

"Written by three experts in the field, Deep Learning is the only comprehensive book on the subject." -- Elon Musk, co-chair of OpenAI; co-founder and CEO of Tesla and SpaceX

Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.

The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.

Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

作者简介：

Ian Goodfellow is Research Scientist at OpenAI. Yoshua Bengio is Professor of Computer Science at the Université de Montréal. Aaron Courville is Assistant Professor of Computer Science at the Université de Montréal.

目录：

Acknowledgments xv

Notation xix

1 Introduction 1

1.1 Who Should Read This Book? 8

1.2 Historical Trend sin Deep Learning 12

I Applied Math and Machine Learning Basics 27

2 Linear Algebra 29

2.1 Scalars, Vectors, Matrices and Tensors 29

2.2 Multiplying Matricesand Vectors 32

2.3 Identity and Inverse Matrices 34

2.4 Linear Dependence and Span 35

2.5 Norms 36

2.6 Special Kinds of Matrices and Vectors 38

2.7 Eigendecomposition 39

2.8 Singular Value Decomposition 42

2.9 The Moore-Penrose Pseudoinverse 43

2.10 The Trace Operator 44

2.11 The Determinant 45

2.12 Example: Principal Components Analysis 45

3 Probability and Information Theory 51

3.1 Why Probability? 52

3.2 Random Variables 54

3.3 Probability Distributions 54

3.4 Marginal Probability 56

3.5 ConditionalProbability 57

3.6 The Chain Rule of Conditional Probabilities 57

3.7 Independence and Conditional Independence 58

3.8 Expectation, Varianceand Covariance 58

3.9 Common Probability Distributions 60

3.10 UsefulPropertiesofCommonFunctions 65

3.11 Bayes’Rule 68

3.12 Technical Details of Continuous Variables 68

3.13 Information Theory 70

3.14 Structured Probabilistic Models 74

4 Numerical Computation 77

4.1 Overflow and Underflow 77

4.2 Poor Conditioning 79

4.3 Gradient-Based Optimization 79

4.4 Constrained Optimization 89

4.5 Example: Linear Least Squares 92

5 Machine Learning Basics 95

5.1 Learning Algorithms 96

5.2 Capacity, Overfitting and Underfitting 107

5.3 Hyperparameters and Validation Sets 117

5.4 Estimators, Bias and Variance 119

5.5 Maximum Likelihood Estimation 128

5.6 BayesianStatistics132

5.7 Supervised Learning Algorithms 136

5.8 Unsupervised Learning Algorithms142

5.9 StochasticGradientDescent 147

5.10 Building a Machine Learning Algorithm 149

5.11 Challenges Motivating Deep Learning 151

II Deep Networks: Modern Practices 161

6 Deep Feedforward Networks 163

6.1 Example:Learning XOR 166

6.2 Gradient-Based Learning 171

6.3 Hidden Units 185

6.4 Architecture Design 191

6.5 Back-Propagation and Other Dierentiation Algorithms 197

6.6 Historical Notes 217

7 Regularization for Deep Learning 221

7.1 Parameter Norm Penalties 223

7.2 Norm Penalties as Constrained Optimization 230

7.3 Regularization and Under-Constrained Problems 232

7.4 Dataset Augmentation 233

7.5 Noise Robustness 235

7.6 Semi-Supervised Learning236

7.7 Multitask Learning 237

7.8 Early Stopping 239

7.9 Parameter Tying and Parameter Sharing 246

7.10 Sparse Representations 247

7.11 Bagging and Other Ensemble Methods 249

7.12 Dropout 251

7.13 Adversarial Training261

7.14 Tangent Distance, Tangent Prop and Manifold Tangent Classiffer 263

8 Optimization for Training DeepModels 267

8.1 How Learning Differs from Pure Optimization 268

8.2 Challenges in Neural Network Optimization 275

8.3 Basic Algorithms 286

8.4 Parameter Initialization Strategies 292

8.5 Algorithms with Adaptive Learning Rates 298

8.6 Approximate Second-Order Methods 302

8.7 Optimization Strategies and Meta-Algorithms 309

9 Convolutional Networks 321

9.1 The Convolution Operation 322

9.2 Motivation 324

9.3 Pooling 330

9.4 Convolution and Pooling as an Infinitely Strong Prior 334

9.5 Variants of the Basic Convolution Function 337

9.6 Structured Outputs 347

9.7 Data Types 348

9.8 Efficient Convolution Algorithms 350

9.9 Random or Unsupervised Features 351

9.10 The Neuroscientific Basis for Convolutional Networks 353

9.11 Convolutional Networks and the History of Deep Learning 359

10 Sequence Modeling: Recurrent and Recursive Nets 363

10.1 Unfolding Computational Graphs 365

10.2 Recurrent Neural Networks 368

10.3 Bidirectional RNNs 383

10.4 Encoder-Decoder Sequence-to-Sequence Architectures 385

10.5 Deep Recurrent Networks 387

10.6 Recursive Neural Networks 388

10.7 The Challenge of Long-Term Dependencies 390

10.8 Echo State Networks 392

10.9 Leaky Units and Other Strategies for Multiple Time Scales 395

10.10 The Long Short-Term Memory and Other Gated RNNs 397

10.11 Optimization for Long-Term Dependencies 401

10.12 Explicit Memory 405

11 Practical Methodology 409

11.1 Performance Metrics 410

11.2 DefaultBaselineModels 413

11.3 Determining Whether to Gather More Data 414

11.4 Selecting Hyperparameters 415

11.5 Debugging Strategies 424

11.6 Example: Multi-Digit Number Recognition 428

12 Applications 431

12.1 Large-Scale Deep Learning 431

12.2 Computer Vision.440

12.3 Speech Recognition 446

12.4 Natural Language Processing 448

12.5 Other Applications 465

III Deep Learning Research 475

13 Linear Factor Models 479

13.1 Probabilistic PCA and Factor Analysis 480

13.2 Independent Component Analysis (ICA) 481

13.3 Slow Feature Analysis.484

13.4 Sparse Coding 486

13.5 Manifold Interpretation of PCA 489

14 Autoencoders 493

14.1 Undercomplete Autoencoders 494

14.2 Regularized Autoencoders 495

14.3 Representational Power, Layer Size and Depth 499

14.4 Stochastic Encodersand Decoders 500

14.5 Denoising Autoencoders501

14.6 Learning Manifolds with Autoencoders 506

14.7 Contractive Autoencoders 510

14.8 Predictive Sparse Decomposition 514

14.9 Applications of Autoencoders515

15 Representation Learning 517

15.1 Greedy Layer-Wise Unsupervised Pretraining 519

15.2 Transfer Learning and Domain Adaptation 526

15.3 Semi-Supervised Disentangling of Causal Factors 532

15.4 Distributed Representation 536

15.5 Exponential Gains from Depth 543

15.6 Providing Clues to Discover Underlying Causes 544

16 Structured Probabilistic Models for Deep Learning 549

16.1 The Challenge of Unstructured Modeling 550

16.2 Using Graphs to Describe Model Structure 554

16.3 Sampling from Graphical Models 570

16.4 Advantages of Structured Modeling 572

16.5 Learning about Dependencies 572

16.6 Inferenceand Approximate Inference 573

16.7 The Deep Learning Approach to Structured Probabilistic Models 575

17 Monte Carlo Methods 581

17.1 Sampling and Monte Carlo Methods 581

17.2 Importance Sampling 583

17.3 Markov Chain Monte Carlo Methods 586

17.4 Gibbs Sampling 590

17.5 The Challenge of Mixing between Separated Modes 591

18 Confronting the Partition Function 597

18.1 The Log-Likelihood Gradient 598

18.2 Stochastic Maximum Likelihood and Contrastive Divergence 599

18.3 Pseudolikelihood 607

18.4 Score Matching and Ratio Matching 609

18.5 DenoisingScore Matching 611

18.6 Noise-Contrastive Estimation 612

18.7 Estimatingthe Partition Function 614

19 Approximate Inference 623

19.1 Inferenceas Optimization 624

19.2 Expectation Maximization 626

19.3 MAP Inferenceand Sparse Coding 627

19.4 Variational Inferenceand Learning 629

19.5 Learned Approximate Inference 642

20 Deep Generative Models 645

20.1 Boltzmann Machines 645

20.2 Restricted Boltzmann Machines 647

20.3 Deep Belief Networks 651

20.4 Deep Boltzmann Machines 654

20.5 Boltzmann Machines for Real-Valued Data 667

20.6 Convolutional Boltzmann Machines 673

20.7 Boltzmann Machines for Structured or Sequential Outputs 675

20.8 Other Boltzmann Machines.677

20.9 Back-Propagation through Random Operations 678

20.10 Directed Generative Nets 682

20.11 Drawing Samples from Autoencoders 701

20.12 Generative Stochastic Networks 704

20.13 Other Generation Schemes 706

20.14 Evaluating Generative Models 707

20.15 Conclusion 710

Bibliography 711

Index 767

价格感人

Tulingedu 2017-03-16 0赞

书很好，虽然价格感人，但是绝对是值得的。

唉，豆瓣必须140字。这本书亚马逊有卖，就不要去淘宝买了，说多了都是泪。

本书的文献比较多，如果有时间不妨去看看，大神使用的文献也是相当经典的。数了一下，页数也不少，如果没有耐心，直接看deep learningnet 的入门文献。

相当感动的一点是目录之后的感谢名单，里面不少是中国人，虽然也不是很多，还是感谢你们。

推荐章节

Enigma 2017-01-14 12赞

年初收到，闭关两周终于是从头到尾读完了，读的不算很精，好在做了不少笔记与注释，二刷三刷肯定是要的。现在市面上系统讲解Deep Learning的书不多，出版的更少，毕竟像CNN这种东西本身就没人真懂，在这领域经验主义远强于逻辑主义。这本书是我目前读过讲DL最好的一本(虽然我读过的并不多，有其他好书还请推荐)。这里推荐几个写的很好的章节。

第一部分：
      这部分其实没什么好说的，基本就是统计学习最基础的线性代数，概率论等，第4章值得一读，讲了些数值分析里常涉及的几个概念(Poor Conditioning, Optimization method)。当然如果系统的上过Numerical Analysis或Computational Physics的话这章基本可以无视。第5章介绍了非玄学领域的Machine Learning各个算法。鉴于本书名字是Deep Learning，这章内容基本都是粗略介绍，严格的数学推导较少，建议只要读5.10与5.11就可以。想仔细研究ML的话这里顺道推荐3本ML好书：
An Introduction to Statistical Learning by Daniela Witten, Gareth James, Robert Tibshirani, and Trevor Hastie
The Elements of Statistical Learning by Jerome H. Friedman, Robert Tibshirani, and Trevor Hastie
Foundations of Machine Learning by Afshin Rostamizadeh, Ameet Talwalkar, and Mehryar Mohri

第二部分：
     从这部分开始正式进入玄学领域！7,8,11章重点推荐，第二部分的精髓也是本书的精髓！DP领域各种花样层出不穷，无非就是变着法子定义与连接神经元，各种模型淘汰更新的很快，但是第7章讲的正则化(Regularization)，第8章的优化方法(Optimization)和11章的方法论却是任何模型都离不开的。今天学到的模型可能下周就被超越了，但是这三章的内容永远有用，嗯，至少现阶段的深度学习框架下永远有用。这三章也是集中了三位作者个人知识与经验的地方，比如就正则化来说，所有关于ML或DP的教材都会涉及几种方法，但是唯有本书将(基本)所有正则化方法放到一起讲解并做横向比较。作者们还根据个人经验详细阐述了各种情况下对应方法选择，简直就是面试圣经(大雾)。
     9,10,12章讲的是现在比较成熟的模型及其应用，这方面在明白原理的情况下自己写个模型做实验或读读大牛的code都是更高效的学习方法。这三章里推荐9.5(CNN的改进与变体), 10.8-10.12(Long-Term Dependencies的几种解决方案)。吐槽下虽然12章起名叫Application,但是介绍的诸如Neural Turing Machine, Reasoning and Question Answering还远未成熟。

第三部分：
     本部分讲的是深度学习领域比较活跃的科研问题。对于应用者可以说没有章节推荐，对于科研者可以挑感兴趣的读。从这部分里判别式模型(discriminative model)减少而生成式模型(generative model)增多就可以看出主流的研究方向。14章的Autoencoders很重要也很有应用前景。16,17章的Probability Model与Monte Carlo Method具有普适性也值得一读。19章讲的EM算法，MAP，Sparse Coding都是Inference里重中之重。20章的各种Deep Generative Model是本书最难，也是装逼吹牛不可或缺的资本。其实对于这部分，更有价值的是书里推荐的各种paper。本书16年末才出版，书中引用推荐的paper都是最近比较新也很有影响力的。真正做科研的不妨跟着书里引用paper的顺序跟着慢慢读下去。

注：
     以上推荐的章节都是针对我个人的知识储备，纯属一家之言，并不具有普适性，如果不确定该读什么就全读吧！
     不知为何一提这本书默认作者就是Bengio，明明Goodfellow也很有名的！
     GitHub上有中文翻译了，我没读过不知道翻得如何。
     https://github.com/exacity/deeplearningbook-chinese

推荐章节

Enigma 2017-01-14 12

价格感人

Tulingedu 2017-03-16 0

书很好，虽然价格感人，但是绝对是值得的。

唉，豆瓣必须140字。这本书亚马逊有卖，就不要去淘宝买了，说多了都是泪。

相当感动的一点是目录之后的感谢名单，里面不少是中国人，虽然也不是很多，还是感谢你们。

我要写长评

Deep Learning

推荐文章

猜你喜欢

附近的人在看

推荐阅读

拓展阅读