Data science for business : [what you need to know about data mining and data-analytic thinking] / Foster Provost and Tom Fawcett.
-
Provost, Foster, 1964- (författare)
-
Fawcett, Tom (författare)
- ISBN 9781449361327
- 1. uppl.
- Publicerad: Sebastopol, Calif. O'Reilly, 2013
- Engelska xviii, 384 s.
Innehållsförteckning
Ämnesord
Stäng
- Machine generated contents note: 1. Introduction: Data-Analytic Thinking -- The Ubiquity of Data Opportunities -- Example: Hurricane Frances -- Example: Predicting Customer Churn -- Data Science, Engineering, and Data-Driven Decision Making -- Data Processing and "Big Data" -- From Big Data 1.0 to Big Data 2.0 -- Data and Data Science Capability as a Strategic Asset -- Data-Analytic Thinking -- This Book -- Data Mining and Data Science, Revisited -- Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist -- Summary -- 2. Business Problems and Data Science Solutions -- Fundamental concepts: A set of canonical data mining tasks; The data mining process; Supervised versus unsupervised data mining -- From Business Problems to Data Mining Tasks -- Supervised Versus Unsupervised Methods -- Data Mining and Its Results -- The Data Mining Process -- Business Understanding -- Data Understanding -- Data Preparation -- Modeling -- Evaluation -- Deployment -- Implications for Managing the Data Science Team -- Other Analytics Techniques and Technologies -- Statistics -- Database Querying -- Data Warehousing -- Regression Analysis -- Machine Learning and Data Mining -- Answering Business Questions with These Techniques -- Summary -- 3. Introduction to Predictive Modeling: From Correlation to Supervised Segmentation -- Fundamental concepts: Identifying informative attributes; Segmenting data by progressive attribute selection -- Exemplary techniques: Finding correlations; Attribute/variable selection; Tree induction -- Models, Induction, and Prediction -- Supervised Segmentation -- Selecting Informative Attributes -- Example: Attribute Selection with Information Gain -- Supervised Segmentation with Tree-Structured Models -- Visualizing Segmentations -- Trees as Sets of Rules -- Probability Estimation -- Example: Addressing the Churn Problem with Tree Induction -- Summary -- 4. Fitting a Model to Data -- Fundamental concepts: Finding "optimal" model parameters based on data; Choosing the goal for data mining; Objective functions; Loss functions -- Exemplary techniques: Linear regression; Logistic regression; Support-vector machines -- Classification via Mathematical Functions -- Linear Discriminant Functions -- Optimizing an Objective Function -- An Example of Mining a Linear Discriminant from Data -- Linear Discriminant Functions for Scoring and Ranking Instances -- Support Vector Machines, Briefly -- Regression via Mathematical Functions -- Class Probability Estimation and Logistic "Regression" -- Logistic Regression: Some Technical Details -- Example: Logistic Regression versus Tree Induction -- Nonlinear Functions, Support Vector Machines, and Neural Networks -- Summary -- 5. Overfitting and Its Avoidance -- Fundamental concepts: Generalization; Fitting and overfitting; Complexity control -- Exemplary techniques: Cross-validation; Attribute selection; Tree pruning; Regularization -- Generalization -- Overfitting -- Overfitting Examined -- Holdout Data and Fitting Graphs -- Overfitting in Tree Induction -- Overfitting in Mathematical Functions -- Example: Overfitting Linear Functions -- Example: Why Is Overfitting Bad? -- From Holdout Evaluation to Cross-Validation -- The Churn Dataset Revisited -- Learning Curves -- Overfitting Avoidance and Complexity Control -- Avoiding Overfitting with Tree Induction -- A General Method for Avoiding Overfitting -- Avoiding Overfitting for Parameter Optimization -- Summary -- 6. Similarity, Neighbors, and Clusters -- Fundamental concepts: Calculating similarity of objects described by data; Using similarity for prediction; Clustering as similarity-based segmentation -- Exemplary techniques: Searching for similar entities; Nearest neighbor methods; Clustering methods; Distance metrics for calculating similarity -- Similarity and Distance -- Nearest-Neighbor Reasoning -- Example: Whiskey Analytics -- Nearest Neighbors for Predictive Modeling -- How Many Neighbors and How Much Influence? -- Geometric Interpretation, Overfitting, and Complexity Control -- Issues with Nearest-Neighbor Methods -- Some Important Technical Details Relating to Similarities and Neighbors -- Heterogeneous Attributes -- Other Distance Functions -- Combining Functions: Calculating Scores from Neighbors -- Clustering -- Example: Whiskey Analytics Revisited -- Hierarchical Clustering -- Nearest Neighbors Revisited: Clustering Around Centroids -- Example: Clustering Business News Stories -- Understanding the Results of Clustering -- Using Supervised Learning to Generate Cluster Descriptions -- Stepping Back: Solving a Business Problem Versus Data Exploration -- Summary -- 7. Decision Analytic Thinking I: What Is a Good Model? -- Fundamental concepts: Careful consideration of what is desired from data science results; Expected value as a key evaluation framework; Consideration of appropriate comparative baselines -- Exemplary techniques: Various evaluation metrics; Estimating costs and benefits; Calculating expected profit; Creating baseline methods for comparison -- Evaluating Classifiers -- Plain Accuracy and Its Problems -- The Confusion Matrix -- Problems with Unbalanced Classes -- Problems with Unequal Costs and Benefits -- Generalizing Beyond Classification -- A Key Analytical Framework: Expected Value -- Using Expected Value to Frame Classifier Use -- Using Expected Value to Frame Classifier Evaluation -- Evaluation, Baseline Performance, and Implications for Investments in Data -- Summary -- 8. Visualizing Model Performance -- Fundamental concepts: Visualization of model performance under various kinds of uncertainty; Further consideration of what is desired from data mining results -- Exemplary techniques: Profit curves; Cumulative response curves; Lift curves; ROC curves -- Ranking Instead of Classifying -- Profit Curves -- ROC Graphs and Curves -- The Area Under the ROC Curve (AUC) -- Cumulative Response and Lift Curves -- Example: Performance Analytics for Churn Modeling -- Summary -- 9. Evidence and Probabilities -- Fundamental concepts: Explicit evidence combination with Bayes' Rule; Probabilistic reasoning via assumptions of conditional independence -- Exemplary techniques: Naive Bayes classification; Evidence lift -- Example: Targeting Online Consumers With Advertisements -- Combining Evidence Probabilistically -- Joint Probability and Independence -- Bayes' Rule -- Applying Bayes' Rule to Data Science -- Conditional Independence and Naive Bayes -- Advantages and Disadvantages of Naive Bayes -- A Model of Evidence "Lift" -- Example: Evidence Lifts from Facebook "Likes" -- Evidence in Action: Targeting Consumers with Ads -- Summary -- 10. Representing and Mining Text -- Fundamental concepts: The importance of constructing mining-friendly data representations; Representation of text for data mining -- Exemplary techniques: Bag of words representation; TFIDF calculation; N-grams; Stemming; Named entity extraction; Topic models -- Why Text Is Important -- Why Text Is Difficult -- Representation -- Bag of Words -- Term Frequency -- Measuring Sparseness: Inverse Document Frequency -- Combining Them: TFIDF -- Example: Jazz Musicians -- The Relationship of IDF to Entropy -- Beyond Bag of Words -- N-gram Sequences -- Named Entity Extraction -- Topic Models -- Example: Mining News Stories to Predict Stock Price Movement -- The Task -- The Data -- Data Preprocessing -- Results -- Summary -- 11. Decision Analytic Thinking II: Toward Analytical Engineering -- Fundamental concept: Solving business problems with data science starts with analytical engineering: designing an analytical solution, based on the data, tools, and techniques available -- Exemplary technique: Expected value as a framework for data science solution design -- Targeting the Best Prospects for a Charity Mailing -- The Expected Value Framework: Decomposing the Business Problem and Recomposing the Solution Pieces -- A Brief Digression on Selection Bias -- Our Churn Example Revisited with Even More Sophistication -- The Expected Value Framework: Structuring a More Complicated Business Problem -- Assessing the Influence of the Incentive -- From an Expected Value Decomposition to a Data Science Solution -- Summary -- 12. Other Data Science Tasks and Techniques -- Fundamental concepts: Our fundamental concepts as the basis of many common data science techniques; The importance of familiarity with the building blocks of data science -- Exemplary techniques: Association and co-occurrences; Behavior profiling; Link prediction; Data reduction; Latent information mining; Movie recommendation; Bias-variance decomposition of error; Ensembles of models; Causal reasoning from data -- Co-occurrences and Associations: Finding Items That Go Together -- Measuring Surprise: Lift and Leverage -- Example: Beer and Lottery Tickets -- Associations Among Facebook Likes -- Profiling: Finding Typical Behavior -- Link Prediction and Social Recommendation -- Data Reduction, Latent Information, and Movie Recommendation -- Bias, Variance, and Ensemble Methods -- Data-Driven Causal Explanation and a Viral Marketing Example -- Summary -- 13. Data Science and Business Strategy -- Fundamental concepts: Our principles as the basis of success for a data-driven business; Acquiring and sustaining competitive advantage via data science; The importance of careful curation of data science capability -- Thinking Data-Analytically, Redux -- Achieving Competitive Advantage with Data Science -- Sustaining Competitive Advantage with Data Science -- Formidable Historical Advantage -- Unique Intellectual Property -- Unique Intangible Collateral Assets -- Superior Data Scientists -- Superior Data Science Management -- Attracting and Nurturing Data Scientists and Their Teams -- Examine Data Science Case Studies -- Be Ready to Accept Creative Ideas from Any Source -- Be Ready to Evaluate Proposals for Data Science Projects -- Example Data Mining Proposal.
Ämnesord
- Data mining (sao)
- Big data (LCSH)
- Business -- Data processing (LCSH)
- Data mining (LCSH)
Klassifikation
- 006.312 (DDC)
- Pud (kssb/8 (machine generated))
Inställningar
Hjälp
Titeln finns på 10 bibliotek.
Ange som favoritAnge som favorit
-
Örebro universitetsbibliotek (O)Ange som favorit
-
Mina lånLänkLåna/reservera
-
-
Placering: COURSE 006.3 Provost
-
Placering: COURSE_SH KURSREF Provost
Utlånad?Öppettider, adress m.m.
Ange som favoritAnge som favoritAnge som favoritAnge som favorit