Data Science at the Command Line
查字典图书网
当前位置: 查字典 > 图书网 > 算法> Data Science at the Command Line

Data Science at the Command Line

7.5

作者: [荷] Jeroen Janssens
出版社: O'Reilly Media
副标题: Facing the Future with Time-Tested Tools
出版年: 2014-10-20
页数: 212
定价: USD 39.99
装帧: Paperback
ISBN: 9781491947852



推荐文章

猜你喜欢

附近的人在看

推荐阅读

拓展阅读

内容简介:

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.

To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.

Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.

●Obtain data from websites, APIs, databases, and spreadsheets

●Perform scrub operations on plain text, CSV, HTML/XML, and JSON

●Explore data, compute descriptive statistics, and create visualizations

●Manage your data science workflow using Drake

●Create reusable tools from one-liners and existing Python or R code

●Parallelize and distribute data-intensive pipelines using GNU Parallel

●Model data with dimensionality reduction, clustering, regression, and classification algorithms

作者简介:

Jeroen is a Senior Data Scientist at YPlan in New York City. He has an M.Sc. in Artificial Intelligence and a Ph.D. in Machine Learning. He has authored a book titled Data Science at the Command Line, which has just been published by O'Reilly. Jeroen enjoys biking the Brooklyn Bridge, building tools, and eating stroopwafels.

目录:

Chapter 1 Introduction

Overview

Data Science Is OSEMN

Intermezzo Chapters

What Is the Command Line?

Why Data Science at the Command Line?

A Real-World Use Case

Further Reading

Chapter 2 Getting Started

Overview

Setting Up Your Data Science Toolbox

Essential Concepts and Tools

Further Reading

Chapter 3 Obtaining Data

Overview

Copying Local Files to the Data Science Toolbox

Decompressing Files

Converting Microsoft Excel Spreadsheets

Querying Relational Databases

Downloading from the Internet

Calling Web APIs

Further Reading

Chapter 4 Creating Reusable Command-Line Tools

Overview

Converting One-Liners into Shell Scripts

Creating Command-Line Tools with Python and R

Further Reading

Chapter 5 Scrubbing Data

Overview

Common Scrub Operations for Plain Text

Working with CSV

Working with HTML/XML and JSON

Common Scrub Operations for CSV

Further Reading

Chapter 6 Managing Your Data Workflow

Overview

Introducing Drake

Installing Drake

Obtain Top Ebooks from Project Gutenberg

Every Workflow Starts with a Single Step

Well, That Depends

Rebuilding Specific Targets

Discussion

Further Reading

Chapter 7 Exploring Data

Overview

Inspecting Data and Its Properties

Computing Descriptive Statistics

Creating Visualizations

Further Reading

Chapter 8 Parallel Pipelines

Overview

Serial Processing

Parallel Processing

Distributed Processing

Discussion

Further Reading

Chapter 9 Modeling Data

Overview

More Wine, Please!

Dimensionality Reduction with Tapkee

Clustering with Weka

Regression with SciKit-Learn Laboratory

Classification with BigML

Further Reading

Chapter 10 Conclusion

Let’s Recap

Three Pieces of Advice

Where to Go from Here?

Getting in Touch

展开全文
热门标签:
  • 大家都在看
  • 小编推荐
  • 猜你喜欢
  •