DSC10 Textbook
  • Textbook
  • UC San Diego jupyterhub
    • 0. Introduction
    • 1. Data Science
      • 1.1 Introduction
        • 1.1.1 Computational Tools
        • 1.1.2 Statistical Techniques
      • 1.2 Why Data Science?
      • 1.3 Plotting the Classics
        • 1.3.1 Literary Characters
        • 1.3.2 Another Kind of Character
    • 2. Causality and Experiments
      • 2.1 John Snow and the Broad Street Pump
      • 2.2 Snow’s “Grand Experiment”
      • 2.3 Establishing Causality
      • 2.4 Randomization
      • 2.5 Endnote
    • 3. Programming in Python
      • 3.1 Expressions
      • 3.2 Names
        • 3.2.1 Example: Growth Rates
      • 3.3 Call Expressions
      • 3.4 Introduction to Tables
    • 4. Data Types
      • 4.1 Numbers
      • 4.2 Strings
        • 4.2.1 String Methods
      • 4.3 Comparisons
    • 5. Sequences
      • 5.1 Arrays
      • 5.2 Ranges
      • 5.3 More on Arrays
    • 6. Tables
      • 6.1 Sorting Rows
      • 6.2 Selecting Rows
      • 6.3 Example: Population Trends
      • 6.4 Example: Trends in Gender
    • 7. Visualization
      • 7.1 Categorical Distributions
      • 7.2 Numerical Distributions
      • 7.3 Overlaid Graphs
    • 8. Functions and Tables
      • 8.1 Applying Functions to Columns
      • 8.2 Classifying by One Variable
      • 8.3 Cross-Classifying
      • 8.4 Joining Tables by Columns
      • 8.5 Bike Sharing in the Bay Area
    • 9. Randomness
      • 9.1 Conditional Statements
      • 9.2 Iteration
      • 9.3 Simulation
      • 9.4 The Monty Hall Problem
      • 9.5 Finding Probabilities
    • 10. Sampling and Empirical Distributions
      • 10.1 Empirical Distributions
      • 10.2 Sampling from a Population
      • 10.3 Empirical Distibution of a Statistic
    • 11. Testing Hypotheses
      • 11.1 Assessing Models
      • 11.2 Multiple Categories
      • 11.3 Decisions and Uncertainty
    • 12. Comparing Two Samples
      • 12.1 A/B Testing
      • 12.2 Deflategate
      • 12.3 Causality
    • 13. Estimation
      • 13.1 Percentiles
      • 13.2 The Bootstrap
      • 13.3 Confidence Intervals
      • 13.4 Using Confidence Intervals
    • 14. Why the Mean Matters
      • 14.1 Properties of the Mean
      • 14.2 Variability
      • 14.3 The SD and the Normal Curve
      • 14.4 The Central Limit Theorem
      • 14.5 The Variability of the Sample Mean
      • 14.6 Choosing a Sample Size
    • 15. Prediction
      • 15.1 Correlation
      • 15.2 The Regression Line
      • 15.3 The Method of Least Squares
      • 15.4 Least Squares Regression
      • 15.5 Visual Diagnostics
      • 15.6 Numerical Diagnostics
    • 16. Inference for Regression
      • 16.1 A Regression Model
      • 16.2 Inference for the True Slope
      • 16.3 Prediction Intervals
    • 17. Classification
      • 17.1 Nearest Neighbors
      • 17.2 Training and Testing
      • 17.3 Rows of Tables
      • 17.4 Implementing the Classifier
      • 17.5 The Accuracy of the Classifier
      • 17.6 Multiple Regression
    • 18. Updating Predictions
      • 18.1 A “More Likely Than Not” Binary Classifier
      • 18.2 Making Decisions