100 Best Data Mining Books of All Time

We've researched and ranked the best data mining books in the world, based on recommendations from world experts, sales data, and millions of reader ratings. Learn more

Featuring recommendations from Nassim Nicholas Taleb, Bill Gates, Richard Branson, and 52 other experts.
1
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll...
more
Recommended by Kirk Borne, and 1 others.

Kirk BorneGreat book for Business Analytics and for building #AnalyticThinking >> “#DataScience for Business — What You Need to Know about #DataMining and Data-Analytic Thinking”: https://t.co/e9rAFnVYYQ #BigData #MachineLearning #DataStrategy #AnalyticsStrategy #Algorithms https://t.co/yEblfU2MZd (Source)

See more recommendations for this book...

2
WARNING! To avoid buying counterfeit on Amazon, click on "See All Buying Options" and choose "Amazon.com" and not a third-party seller.

Concise and to the point — the book can be read during a week. During that week, you will learn almost everything modern machine learning has to offer. The author and other practitioners have spent years learning these concepts.

Companion wiki — the book has a continuously updated wiki that extends some book chapters with additional information: Q&A, code snippets, further reading, tools, and other relevant resources.
more
Recommended by Kirk Borne, and 1 others.

Kirk BorneRecent top-selling books in #AI & #MachineLearning: https://t.co/Ij9I7SzR4d ————— #BigData #DataScience #DataMining #Algorithms #PredictiveAnalytics #Python ————— ...in the TOP 10: 1)The Hundred-Page ML Book: https://t.co/dQ7nP6gwP0 2)Hands-on ML with...: https://t.co/Y0Iz3GbtGP https://t.co/72rAFN1FwW (Source)

See more recommendations for this book...

3

An Introduction to Statistical Learning

With Applications in R

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree- based methods, support vector machines, clustering, and more. Color graphics and... more
Recommended by Roger D. Peng, and 1 others.

Roger D. PengThis book is written by a powerhouse of authors in the machine learning community, true authorities in the field. But beyond that, they’re also great writers. (Source)

See more recommendations for this book...

4
The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references. The highlights for the new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; plus much more. less

See more recommendations for this book...

5
One of Wall Street Journal's Best Ten Works of Nonfiction in 2012

New York Times Bestseller

"Not so different in spirit from the way public intellectuals like John Kenneth Galbraith once shaped discussions of economic policy and public figures like Walter Cronkite helped sway opinion on the Vietnam War…could turn out to be one of the more momentous books of the decade."
-New York Times Book Review

"Nate Silver's The Signal and the Noise is The Soul of a New Machine for the 21st century."
-Rachel Maddow, author of Drift

"A serious...
more
Recommended by Bill Gates, and 1 others.

Bill GatesAnyone interested in politics may be attracted to Nate Silver’s The Signal and the Noise: Why So Many Predictions Fail—but Some Don't. Silver is the New York Times columnist who got a lot of attention last fall for predicting—accurately, as it turned out–the results of the U.S. presidential election. This book actually came out before the election, though, and it’s about predictions in many... (Source)

See more recommendations for this book...

6
Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions.

But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope.

Data science is little more than using straight-forward steps to process raw data into...
more

See more recommendations for this book...

7
"The authors' clear visual style provides a comprehensive look at what's currently possible with artificial neural networks as well as a glimpse of the magic that's to come."
--Tim Urban, author of Wait But Why Fully Practical, Insightful Guide to Modern Deep Learning

Deep learning is transforming software, facilitating powerful new artificial intelligence capabilities, and driving unprecedented algorithm performance. Deep Learning Illustrated is uniquely intuitive and offers a complete introduction to the discipline's techniques. Packed with...
more
Recommended by Kirk Borne, and 1 others.

Kirk Borne🌟📘📊📈Awesome new book >> #DeepLearning Illustrated — A Visual, Interactive Guide to Artificial Intelligence” https://t.co/xIW48MskrR by @JonKrohnLearns ——————— #BigData #Analytics #DataScience #AI #MachineLearning #Algorithms #NeuralNetworks https://t.co/JKSrVRLpS0 (Source)

See more recommendations for this book...

8
Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests. Based on practical experiences at companies that each run more than 20,000 controlled experiments a year, the authors share examples, pitfalls, and advice for students and industry professionals getting started with experiments, plus deeper dives into advanced topics for practitioners who want to improve the way they make data-driven... more

See more recommendations for this book...

9
Want to tap the tremendous amount of valuable social data in Facebook, Twitter, LinkedIn, and Google+? This refreshed edition helps you discover who’s making connections with social media, what they’re talking about, and where they’re located. You’ll learn how to combine social web data, analysis techniques, and visualization to find what you’ve been looking for in the social haystack—as well as useful information you didn’t know existed.

Each standalone chapter introduces techniques for mining data in different areas of the social Web, including blogs and email. All you need to...
more

See more recommendations for this book...

10

Applied Predictive Modeling

This text is intended for a broad audience as both an introduction to predictive models as well as a guide to applying them. Non- mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics. Dr. Kuhn is a... more
Recommended by Kirk Borne, and 1 others.

Kirk BorneFind more than 40 useful #PredictiveModeling articles here at @DataScienceCtrl https://t.co/KdcvLRffRk #abdsc ———— #BigData #DataScience #AI #MachineLearning #Forecasting #Statistics #PredictiveAnalytics ——— +This is the best book on the subject: https://t.co/SmsepmniHi https://t.co/amBJHCJSHN (Source)

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
11
Learn the skills necessary to design, build, and deploy applications powered by machine learning. Through the course of this hands-on book, you'll build an example ML-driven application from initial idea to deployed product. Data scientists, software engineers, and product managers with little or no ML experience will learn the tools, best practices, and challenges involved in building a real-world ML application step-by-step.

Author Emmanuel Ameisen, who worked as a data scientist at Zipcar and led Insight Data Science's AI program, demonstrates key ML concepts with code snippets,...
more

See more recommendations for this book...

13
SQL Made Easy- The Ultimate Step by Step Guide To Success Do you want to learn SQL programming without the complicated explanations?
Do you want to understand how to manage databases without all the confusion?
Well than, this is your go to guide to help you master SQL programming in no time!
This book breaks down the fundamentals elements that are essential to make you proficient in SQL programming and database management
By the end of this book you will be confident enough to take on any problems that encompass SQL
SQL software can be complex,...
more

See more recommendations for this book...

14
Neural networks are getting smaller. Much smaller. The OK Google team, for example, has run machine learning models that are just 14 kilobytes in size--small enough to work on the digital signal processor in an Android phone. With this practical book, you'll learn about TensorFlow Lite for Microcontrollers, a miniscule machine learning library that allows you to run machine learning algorithms on tiny hardware.

Authors Pete Warden and Daniel Situnayake explain how you can train models that are small enough to fit into any environment, including small embedded devices that can run...
more

See more recommendations for this book...

15

Pattern Recognition and Machine Learning

Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science. However, these activities can be viewed as two facets of the same field, and together they have undergone substantial development over the past ten years. In particular, Bayesian methods have grown from a specialist niche to become mainstream, while graphical models have emerged as a general framework for describing and applying probabilistic models. Also, the practical applicability of Bayesian methods has been greatly enhanced through the development of a range of approximate inference... more

See more recommendations for this book...

17
Looking for complete instructions on manipulating, processing, cleaning, and crunching structured data in Python? The second edition of this hands-on guide--updated for Python 3.5 and Pandas 1.0--is packed with practical cases studies that show you how to effectively solve a broad set of data analysis problems, using Python libraries such as NumPy, pandas, matplotlib, and IPython.

Written by Wes McKinney, the main author of the pandas library, Python for Data Analysis also serves as a practical, modern introduction to scientific computing in Python for data-intensive...
more

See more recommendations for this book...

18
Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.

Programming Collective Intelligence takes you into the world of machine learning...
more

See more recommendations for this book...

19
What if your cell phone could detect cancer cells circulating in your blood or warn you of an imminent heart attack? Mobile wireless digital devices, including smartphones and tablets with seemingly limitless functionality, have brought about radical changes in our lives, providing hyper-connectivity to social networks and cloud computing. But the digital world has hardly pierced the medical cocoon.

 Until now. Beyond reading email and surfing the Web, we will soon be checking our vital signs on our phone. We can already continuously monitor our heart rhythm, blood glucose...
more
Recommended by Vinod Khosla, Vinod Khosla, and 2 others.

See more recommendations for this book...

20

Pattern Classification

The first edition, published in 1973, has become a classic reference in the field. Now with the second edition, readers will find information on key new topics such as neural networks and statistical pattern recognition, the theory of machine learning, and the theory of invariances. Also included are worked examples, comparisons between different methods, extensive graphics, expanded exercises and computer project topics.

An Instructor's Manual presenting detailed solutions to all the problems in the book is available from the Wiley editorial department.

less
Recommended by Eric Weinstein, and 1 others.

Eric Weinstein[Eric Weinstein recommended this book on Twitter.] (Source)

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
21
Learn how to build recommender systems from one of Amazon's pioneers in the field. Frank Kane spent over nine years at Amazon, where he managed and led the development of many of Amazon's personalized product recommendation technologies. You've seen automated recommendations everywhere - on Netflix's home page, on YouTube, and on Amazon as these machine learning algorithms learn about your unique interests, and show the best products or content for you as an individual. These technologies have become central to the largest, most prestigious tech employers out there, and by understanding how... more

See more recommendations for this book...

22

Mining of Massive Datasets

The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related... more

See more recommendations for this book...

23
When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.

Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one...
more

See more recommendations for this book...

24
Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you'll examine how to analyze data at scale to derive insights from large datasets efficiently.

Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the...
more

See more recommendations for this book...

25
Why would a casino try and stop you from losing? How can a mathematical formula find your future spouse? Would you know if a statistical analysis blackballed you from a job you wanted?Today, number crunching affects your life in ways you might never imagine. In this lively and groundbreaking new book, economist Ian Ayres shows how today's best and brightest organizations are analyzing massive databases at lightening speed to provide greater insights into human behavior. They are the Super Crunchers. From internet sites like Google and Amazon that know your tastes better than you do, to a... more

See more recommendations for this book...

26

R in Action

Summary

R in Action is the first book to present both the R system and the use cases that make it such a compelling package for business developers. The book begins by introducing the R language, including the development environment. Focusing on practical solutions, the book also offers a crash course in practical statistics and covers elegant methods for dealing with messy and incomplete data using features of R.
About the Technology
R is a powerful language for statistical computing and graphics that can handle virtually any data-crunching task. It...
more

See more recommendations for this book...

27
Thorough in its coverage from basic to advanced topics, this book presents the key algorithms and techniques used in data mining. An emphasis is placed on the use of data mining concepts in real world applications with large database components. Includes unique chapters on Web mining, spatial mining, temporal mining, and prototypes and DM products. Separate case studies section highlights real world applications. An excellent reference book for computer database professionals and researchers. less

See more recommendations for this book...

28
Services like social networks, web analytics, and intelligent e-commerce often need to manage data at a scale too big for a traditional database. As scale and demand increase, so does Complexity. Fortunately, scalability and simplicity are not mutually exclusive—rather than using some trendy technology, a different approach is needed. Big data systems use many machines working in parallel to store and process data, which introduces fundamental challenges unfamiliar to most developers.

Big Data shows how to build these systems using an architecture that takes advantage of...
more
Recommended by Vicki Boykis, and 1 others.

Vicki BoykisThis book remains a great read if you want to understand how modern data architecture works, and especially distributed data systems. (Source)

See more recommendations for this book...

29
An accessible primer on how to create effective graphics from data

This book provides students and researchers a hands-on introduction to the principles and practice of data visualization. It explains what makes some graphs succeed while others fail, how to make high-quality figures from data using powerful and reproducible methods, and how to think about data visualization in an honest and effective way.

Data Visualization builds the reader's expertise in ggplot2, a versatile visualization library for the R programming language. Through a series of worked...
more

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
31
Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.Intended for those who want to learn how to use R's machine learning capabilities and gain insight from your data. Perhaps you already know a bit about machine learning, but have never used R; or... more

See more recommendations for this book...

33

The Data Science Design Manual

This book serves an introduction to data science, focusing on the skills and principles needed to build systems for collecting, analyzing, and interpreting data. As a discipline, data science sits at the intersection of statistics, computer science, and machine learning, but it is building a distinct heft and character of its own.

In particular, the book stresses the following basic principles as fundamental to becoming a good data scientist: "Valuing Doing the Simple Things Right," laying the groundwork of what really matters in analyzing data; "Developing Mathematical Intuition,"...
more

See more recommendations for this book...

35

The Elements of Statistical Learning

Data Mining, Inference, and Prediction

During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the...

more
Recommended by Nassim Nicholas Taleb, and 1 others.

Nassim Nicholas TalebVery comprehensive, sufficiently technical to get most of the plumbing behind machine learning. Very useful as a reference book (actually, there is no other complete reference book). The authors are the real thing (Tibshirani is the one behind the LASSO regularization technique). Uses some mathematical statistics without the burdens of measure theory and avoids the obvious but complicated... (Source)

See more recommendations for this book...

37
The leading introductory book on data mining, fully updated and revised! When Berry and Linoff wrote the first edition of Data Mining Techniques in the late 1990s, data mining was just starting to move out of the lab and into the office and has since grown to become an indispensable tool of modern business. This new edition--more than 50% new and revised-- is a significant update from the previous one, and shows you how to harness the newest data mining methods and techniques to solve common business problems. The duo of unparalleled authors share invaluable advice for improving... more
Recommended by Kirk Borne, and 1 others.

Kirk BorneIf you are just starting your #MachineLearning learning journey, I recommend this as a great beginner’s book: “#DataMining Techniques for Marketing, Sales and Customer Relationship Management” (Third Edition) https://t.co/gSLkCSwLDF #BigData #DataScience #AI #DataScientist #CX https://t.co/2SyjLCqObv (Source)

See more recommendations for this book...

38
Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language), the standard programming language for defining, organizing, and exploring data in relational databases. The book focuses on using SQL to find the story your data tells, with the popular open-source database PostgreSQL and the pgAdmin interface as its primary tools.

You'll first cover the fundamentals of databases and the SQL language, then build skills by analyzing data from the U.S. Census and other federal and state government agencies. With exercises and real-world examples in each...
more

See more recommendations for this book...

39
With a focus on the hands-on, end-to-end process for data mining, this book guides the reader through various capabilities of the easy-to-use, free and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. less

See more recommendations for this book...

40
The Complete Beginner's Guide to Understanding and Building Machine Learning Systems with Python

Machine Learning with Python for Everyone will help you master the processes, patterns, and strategies you need to build effective learning systems, even if you're an absolute beginner. If you can write some Python code, this book is for you, no matter how little college-level math you know. Principal instructor Mark E. Fenner relies on plain-English stories, pictures, and Python examples to communicate the ideas of machine learning.

Mark begins by...
more

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
41
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.

You'll explore the basic operations and common functions of Spark's structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of...
more

See more recommendations for this book...

42
Power BI is a self-service (and enterprise) Business Intelligence (BI) tool that facilitates data acquisition, modeling, and visualization—and the skills needed to succeed with Power BI are fully transferable to Microsoft Excel. There are three learning areas required to master everything Power BI Desktop has to offer: TheM Language, The DAX Language, and analysis. Super Charge Power BI clearly explains the necessary concepts while at the same time giving hands-on practice to engage the reader and help new knowledge stick. less

See more recommendations for this book...

43

Introduction to Data Mining

'Introduction to Data Mining' presents fundamental concepts and algorithms for those learning data mining. Each concept is explored thoroughly and supported with numerous examples. The text requires only a modest background in mathematics. Each major topic is organized into two chapters, beginning with basic concepts that provide necessary background for understanding each data mining technique, followed by more advanced concepts and algorithms. less
Recommended by Kirk Borne, and 1 others.

Kirk BorneThis awesome book’s 2nd edition is now available! >> “Introduction to #DataMining” https://t.co/ZTna3ZQIGv #BigData #DataScience #MachineLearning https://t.co/9NuK5RUxv8 (Source)

See more recommendations for this book...

46
Our ability to generate and collect data has been increasing rapidly. Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and... more

See more recommendations for this book...

47

Text Mining with R

A Tidy Approach

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson developed using the tidy principles behind R packages like ggraph and dplyr. You'll learn how tidytext and other tidy tools in R can make text analysis easier and more effective.

The authors demonstrate how treating text as data frames enables you to manipulate, summarize, and...
more

See more recommendations for this book...

48
Master business modeling and analysis techniques with Microsoft Excel 2016, and transform data into bottom-line results. Written by award-winning educator Wayne Winston, this hands on, scenario-focused guide helps you use Excel's newest tools to ask the right questions and get accurate, actionable answers. This edition adds 150+ new problems with solutions, plus a chapter of basic spreadsheet models to make sure you're fully up to speed.
Solve real business problems with Excel--and build your competitive advantage Quickly transition from Excel basics to sophisticated analytics...
more

See more recommendations for this book...

49

Neural Networks and Deep Learning

Neural Networks and Deep Learning is a free online book. The book will teach you about:
* Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data
* Deep learning, a powerful set of techniques for learning in neural networks

Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you the core concepts...
more

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
51

Big Data

A Revolution That Will Transform How We Live, Work, and Think

A revelatory exploration of the hottest trend in technology and the dramatic impact it will have on the economy, science, and society at large.

Which paint color is most likely to tell you that a used car is in good shape? How can officials identify the most dangerous New York City manholes before they explode? And how did Google searches predict the spread of the H1N1 flu outbreak?

The key to answering these questions, and many more, is big data. “Big data” refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw...
more

See more recommendations for this book...

52
"Mesmerizing & fascinating..." -- The Seattle Post-Intelligencer

"The Freakonomics of big data." --Stein Kretsinger, founding executive of Advertising.com

Award-winning - Used by over 30 universities - Translated into 9 languages

An introduction for everyone. In this rich, fascinating -- surprisingly accessible -- introduction, leading expert Eric Siegel reveals how predictive analytics (aka machine learning) works, and how it affects everyone every day. Rather than a "how to" for hands-on...
more

See more recommendations for this book...

53
How anyone can become a data ninja

From the stock market to genomics laboratories, census figures to marketing email blasts, we are awash with data. But as anyone who has ever opened up a spreadsheet packed with seemingly infinite lines of data knows, numbers aren't enough: we need to know how to make those numbers talk. In The Model Thinker, social scientist Scott E. Page shows us the mathematical, statistical, and computational models--from linear regression to random walks and far beyond--that can turn anyone into a genius. At the core of the book is Page's...
more

See more recommendations for this book...

54
In this volume, Matthew L. Jockers introduces readers to large-scale literary computing and the revolutionary potential of macroanalysis--a new approach to the study of the literary record designed for probing the digital-textual world as it exists today, in digital form and in large quantities. Using computational analysis to retrieve key words, phrases, and linguistic patterns across thousands of texts in digital libraries, researchers can draw conclusions based on quantifiable evidence regarding how literary trends are employed over time, across periods, within regions, or within... more

See more recommendations for this book...

55
Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.

Along the way, you'll experiment...
more

See more recommendations for this book...

56
Discovering knowledge from big multivariate data, recorded every days, requires specialized machine learning techniques.

This book presents an easy to use practical guide in R to compute the most popular machine learning methods for exploring real word data sets, as well as, for building predictive models.

The main parts of the book include: A) Unsupervised learning methods, to explore and discover knowledge from a large multivariate data set using clustering and principal component methods. You will learn hierarchical clustering, k-means, principal component...
more

See more recommendations for this book...

57
Acquire and analyze data from all corners of the social web with Python About This Book Make sense of highly unstructured social media data with the help of the insightful use cases provided in this guide Use this easy-to-follow, step-by-step guide to apply analytics to complicated and messy social data This is your one-stop solution to fetching, storing, analyzing, and visualizing social media data Who This Book Is For This book is for intermediate Python developers who want to engage with the use of public APIs to collect data from social media platforms and perform statistical analysis in... more

See more recommendations for this book...

58

Networks have permeated everyday life through everyday realities like the Internet, social networks, and viral marketing. As such, network analysis is an important growth area in the quantitative sciences, with roots in social network analysis going back to the 1930s and graph theory going back centuries. Measurement and analysis are integral components of network research. As a result, statistical methods play a critical role in network analysis. This book is the first of its kind in network research. It can be used as a stand-alone resource in which multiple R packages are used to...

more

See more recommendations for this book...

59

Cassandra High Availability

If you are a developer or DevOps engineer who understands the basics of Cassandra and are ready to take your knowledge to the next level, then this book is for you. Perhaps you've just got your hands dirty working on a small application, but now you have a use case that demands greater scale and zero downtime. An understanding of the essentials of Cassandra is needed, including knowing how to install and configure Cassandra, create tables, and read and write data. less

See more recommendations for this book...

60

"This unique and essential guide to human visual perception and related cognitive principles will enrich courses on information visualization and empower designers to see their way forward. Ware's updated review of empirical research and interface design examples will do much to accelerate innovation and adoption of information visualization."
—Ben Shneiderman, University of Maryland

"Colin Ware is the perfect person to write this book, with a long history of prominent contributions to the visual interaction with machines and to information visualization directly. It goes a...

more

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
61
Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. This has led to an increasing demand for powerful software tools to help people analyze and manage vast amounts of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans, and are accompanied by semantically rich content. As such, text data are especially valuable... more

See more recommendations for this book...

62
Artificial intelligence touches nearly every part of your day. While you may initially assume that technology such as smart speakers and digital assistants are the extent of it, AI has in fact rapidly become a general-purpose technology, reverberating across industries including transportation, healthcare, financial services, and many more. In our modern era, an understanding of AI and its possibilities for your organization is essential for growth and success.

Artificial Intelligence Basics has arrived to equip you with a fundamental, timely grasp of AI and its...
more

See more recommendations for this book...

64
Get up to speed on the game-changing developments in SQL Server 2019. No longer your grandparent's database engine, SQL Server 2019 is cutting edge with support for artificial intelligence (AI), machine learning (ML), big data performance and analysis, Java, and connections to Azure IOT Edge. This is not a book on traditional database administration and SQL. It focuses on all that is new and cutting edge in one of the best database systems available. This is a book for SQL Server professionals who already know how to create tables and want to up their game by building their skills in some of... more

See more recommendations for this book...

65
Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn.
This revision is fully updated with new content on social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation
Author Fabio...
more

See more recommendations for this book...

66

Doing Data Science

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re...
more

See more recommendations for this book...

67
Decision trees have become one of the most powerful and popular approaches in knowledge discovery and data mining; it is the science of exploring large and complex bodies of data in order to discover useful patterns. Decision tree learning continues to evolve over time. Existing methods are constantly being improved and new methods introduced.This 2nd Edition is dedicated entirely to the field of decision trees in data mining; to cover all aspects of this important technique, as well as improved or new methods and techniques developed after the publication of our first edition. In this new... more

See more recommendations for this book...

69
Remarkable advances in computation and data storage and the ready availability of huge data sets have been the keys to the growth of the new disciplines of data mining and machine learning, while the enormous success of the Human Genome Project has opened up the field of bioinformatics.

These exciting developments, which led to the introduction of many innovative statistical tools for high-dimensional data analysis, are described here in detail. The author takes a broad perspective; for the first time in a book on multivariate analysis, nonlinear methods are discussed in detail as...
more

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
71
Learn how to perform data analysis with the R language and software environment, even if you have little or no programming experience. With the tutorials in this hands-on guide, you'll learn how to use the essential R tools you need to know to analyze data, including data types and programming concepts.

The second half of Learning R shows you real data analysis in action by covering everything from importing data to publishing your results. Each chapter in the book includes a quiz on what you've learned, and concludes with exercises, most of which involve writing R code.
more

See more recommendations for this book...

72
The versatile capabilities and large set of add-on packages make R an excellent alternative to many existing and often expensive data mining tools. Exploring this area from the perspective of a practitioner, Data Mining with R: Learning with Case Studies uses practical examples to illustrate the power of R and data mining.



Assuming no prior knowledge of R or data mining/statistical techniques, the book covers a diverse set of problems that pose different challenges in terms of size, type of data, goals of analysis, and analytical tools. To present the...
more

See more recommendations for this book...

74

Python Machine Learning by Example

Take tiny steps to enter the big world of data science through this interesting guide About This Book - Learn the fundamentals of machine learning and build your own intelligent applications - Master the art of building your own machine learning systems with this example-based practical guide - Work with important classification and regression algorithms and other machine learning techniques Who This Book Is For This book is for anyone interested in entering the data science stream with machine learning. Basic familiarity with Python is assumed. What You Will Learn - Exploit the power of... more

See more recommendations for this book...

75
Between tweets, likes, comments, blogs, videos and images, today’s customer is estimated to generate 2.5 quintillion bytes of data per day. How can marketers utilize the ever-increasing amount of data to better understand and interact with their customers?
 
This book offers advice on how to interpret and incorporate data into an organization’s overall marketing strategy. It is designed to help marketers improve customer relationships, enhance the targeting of their marketing efforts, align marketing activities with ultimate goals and objectives, and gain insight into the...
more

See more recommendations for this book...

76
Practical Data Science with R, Second Edition is a task-based tutorial that leads readers through dozens of useful data analysis practices using the R language. By concentrating on the most important tasks you'll face on the job, this friendly guide is comfortable both for business analysts and data scientists. Because data is only useful if it can be understood, you'll also find fantastic tips for organizing and presenting data in tables, as well as snappy visualizations. less

See more recommendations for this book...

77
Praise for the "First Edition"" full of vivid and thought-provoking anecdotes needs to be read by anyone with a serious interest in research and marketing."--"Research" magazine

"Shmueli et al. have done a wonderful job in presenting the field of data mining a welcome addition to the literature."--computingreviews.com

Incorporating a new focus on data visualization and time series forecasting, "Data Mining for Business Intelligence," Second Edition continues to supply insightful, detailed guidance on fundamental data mining techniques. This new edition guides readers...
more

See more recommendations for this book...

78
Probability and Statistics for Data Science: Math + R + Data covers "math stat"--distributions, expected value, estimation etc.--but takes the phrase "Data Science" in the title quite seriously:



* Real datasets are used extensively.



* All data analysis is supported by R coding.



* Includes many Data Science applications, such as PCA, mixture distributions, random graph models, Hidden Markov models, linear and logistic regression, and neural networks.



* Leads the student to think...
more

See more recommendations for this book...

79
Acclaimed data scientist DJ Patil details a new approach to solving problems in Data Jujitsu.

Learn how to use a problem's "weight" against itself to:

Break down seemingly complex data problems into simplified parts
Use alternative data analysis techniques to examine them
Use human input, such as Mechanical Turk, and design tricks that enlist the help of your users to take short cuts around tough problems
Learn more about the problems before starting on the solutions—and use the findings to solve them, or determine whether the problems are worth solving at...
more

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
81
The Hands-On, Example-Rich Introduction to Pandas Data Analysis in Python Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. Pandas for Everyone brings together practical knowledge and insight for solving real problems... more

See more recommendations for this book...

82
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters.

Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:
more

See more recommendations for this book...

83
The ancient art of the Numerati provides insight into basic datamining techniques for beginners wishing to immerse themselves in the field using practical examples. less

See more recommendations for this book...

84

Unti Nonfiction

In this explosive memoir, a political consultant and technology whistleblower reveals the disturbing truth about the multi-billion-dollar data industry, revealing to the public how companies are getting richer using our personal information and exposing how Cambridge Analytica exploited weaknesses in privacy laws to help elect Donald Trump--and how this could easily happen again in the 2020 presidential election.

When Brittany Kaiser joined Cambridge Analytica--the UK-based political consulting firm funded by conservative billionaire and Donald Trump patron Robert...
more

See more recommendations for this book...

86
Master Powerful Off-the-Shelf Business Solutions for AI and Machine Learning
Pragmatic AI will help you solve real-world problems with contemporary machine learning, artificial intelligence, and cloud computing tools. Noah Gift demystifies all the concepts and tools you need to get results--even if you don't have a strong background in math or data science. Gift illuminates powerful off-the-shelf cloud offerings from Amazon, Google, and Microsoft, and demonstrates proven techniques using the Python data science ecosystem. His workflows and examples help you...
more

See more recommendations for this book...

87

The Practitioner's Guide to Graph Data

How do you apply graph thinking to solve complex problems? With this practical guide, data scientists will learn how to think about data as a graph and determine if graph technology is right for your company. You'll learn techniques for building scalable, real-time, and multimodel architectures that solve complex problems with graph data.

Authors Denise Koessler Gosnell and Matthias Broecheler show you how companies today are successfully applying graph thinking in distributed production environments. You'll also learn the Graph Schema Language, a set of terminology and visual...
more

See more recommendations for this book...

88

Data Mining Techniques in Crm

Inside Customer Segmentation

This is an applied handbook for the application of data mining techniques in the CRM framework. It combines a technical and a business perspective to cover the needs of business users who are looking for a practical guide on data mining. It focuses on Customer Segmentation and presents guidelines for the development of actionable segmentation schemes. By using non-technical language it guides readers through all the phases of the data mining process. less

See more recommendations for this book...

89

Real-World Machine Learning

Summary

Real-World Machine Learning is a practical guide designed to teach working developers the art of ML project execution. Without overdosing you on academic theory and complex mathematics, it introduces the day-to-day practice of machine learning, preparing you to successfully build and deploy powerful ML systems.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Technology

Machine learning systems help you find valuable insights and patterns in data,...
more

See more recommendations for this book...

90

Text Mining

A Guidebook for the Social Sciences

Online communities generate massive volumes of natural language data and the social sciences continue to learn how to best make use of this new information and the technology available for analyzing it. Text Mining: A Guidebook for the Social Sciences brings together a broad range of contemporary qualitative and quantitative methods to provide strategic and practical guidance on analyzing large text collections. This accessible book, written by sociologist Gabe Ignatow and computer scientist Rada Mihalcea, surveys the fast-changing landscape of data sources, programming... more

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.
91
This book reflects decades of important research on the mathematical foundations of speech recognition. It focuses on underlying statistical techniques such as hidden Markov models, decision trees, the expectation-maximization algorithm, information theoretic goodness criteria, maximum entropy probability estimation, parameter and data clustering, and smoothing of probability distributions. The author's goal is to present these principles clearly in the simplest setting, to show the advantages of self-organization from real data, and to enable the reader to apply the techniques. less

See more recommendations for this book...

92

The Numerati

An urgent look at how a global math elite is predicting and altering our behavior -- at work, at the mall, and in bed

Every day we produce loads of data about ourselves simply by living in the modern world: we click web pages, flip channels, drive through automatic toll booths, shop with credit cards, and make cell phone calls. Now, in one of the greatest undertakings of the twenty-first century, a savvy group of mathematicians and computer scientists is beginning to sift through this data to dissect us and map out our next steps. Their goal? To manipulate our behavior -- what we...
more

See more recommendations for this book...

93
Time series data analysis is increasingly important due to the massive production of such data through the internet of things, the digitalization of healthcare, and the rise of smart cities. As continuous monitoring and data collection become more common, the need for competent time series analysis with both statistical and machine learning techniques will increase.

Covering innovations in time series data analysis and use cases from the real world, this practical guide will help you solve the most common data engineering and analysis challengesin time series, using both...
more

See more recommendations for this book...

94
The knowledge discovery process is as old as Homo sapiens. Until some time ago this process was solely based on the 'natural personal' computer provided by Mother Nature. Fortunately, in recent decades the problem has begun to be solved based on the development of the Data mining technology, aided by the huge computational power of the 'artificial' computers. Digging intelligently in different large databases, data mining aims to extract implicit, previously unknown and potentially useful information from data, since "knowledge is power." The goal of this book is to provide, in a friendly... more

See more recommendations for this book...

95
The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You'll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book.

Alex Gorelik, CTO and founder of Waterline...
more

See more recommendations for this book...

96

Interpretable Machine Learning

This book is about making machine learning models and their decisions interpretable.

After exploring the concepts of interpretability, you will learn about simple, interpretable models such as decision trees, decision rules and linear regression. Later chapters focus on general model-agnostic methods for interpreting black box models like feature importance and accumulated local effects and explaining individual predictions with Shapley values and LIME.

All interpretation methods are explained in depth and discussed critically. How do they work under the hood? What are...
more

See more recommendations for this book...

97
Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.

Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs...
more

See more recommendations for this book...

98
Detect fraud earlier to mitigate loss and prevent cascading damage Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques is an authoritative guidebook for setting up a comprehensive fraud detection analytics solution. Early detection is a key factor in mitigating fraud damage, but it involves more specialized techniques than detecting fraud at the more advanced stages. This invaluable guide details both the theory and technical aspects of these techniques, and provides expert insight into streamlining implementation. Coverage includes data gathering,... more

See more recommendations for this book...

99
Natural Language Processing (NLP) provides boundless opportunities for solving problems in artificial intelligence, making products such as Amazon Alexa and Google Translate possible. If you're a developer or data scientist new to NLP and deep learning, this practical guide shows you how to apply these methods using PyTorch, a Python-based deep learning library.

Authors Delip Rao and Brian McMahon provide you with a solid grounding in NLP and deep learning algorithms and demonstrate how to use PyTorch to build applications involving rich representations of text specific to the...
more

See more recommendations for this book...

100
Today, successful firms win by understanding their data more deeply than competitors do. They compete based on analytics. In Modeling Techniques in Predictive Analytics, the Python edition , the leader of Northwestern University's prestigious analytics program brings together all the up-to-date concepts, techniques, and Python code you need to excel in analytics. Thomas W. Miller's balanced approach combines business context and quantitative tools, appealing to managers, analysts, programmers, and students alike. This important reference... more

See more recommendations for this book...

Don't have time to read the top Data Mining books of all time? Read Shortform summaries.

Shortform summaries help you learn 10x faster by:

  • Being comprehensive: you learn the most important points in the book
  • Cutting out the fluff: you focus your time on what's important to know
  • Interactive exercises: apply the book's ideas to your own life with our educators' guidance.