ملاحظات
الفصل الأول: ما علمُ البيانات؟
                  											(1)
                  										
               
               Quote taken from the
                  								call for participation sent out
                  								for the KDD workshop in
                  								1989.
               
            
                  											(2)
                  										
               
               Some practitioners do
                  							distinguish between data mining and
                  							KDD by viewing data mining as a
                  							subfield of KDD or a particular
                  							approach to
                  						KDD.
               
            
                  											(3)
                  										
               
               For a recent review of
                  							this debate, see Battle of the Data
                     								Science Venn Diagrams
                  							(Taylor 2016).
               
            
                  											(4)
                  										
               
               For more on the Cancer
                  							Moonshot Initiative, see
                  						https://www.cancer.gov/research/key-initiatives.
               
            
                  											(5)
                  										
               
               For more on the All of
                  							Us program in the Precision Medicine
                  							Initiative, see
                  						https://allofus.nih.gov.
               
            
                  											(6)
                  										
               
               For more on the Police
                  							Data Initiative, see
                  						https://www.policedatainitiative.org.
               
            
                  											(7)
                  										
               
               For more on AlphaGo, see
                  						https://deepmind.com/research/alphago.
               
            الفصل الثاني: ما المقصود بالبيانات وما المقصود بمجموعة البيانات؟
                  											(1)
                  										
               
               Although many data sets can
                  						be described as a flat n * m
                  						matrix, in some scenarios the data set is
                  						more complex: for example, if a data set
                  						describes the evolution of multiple
                  						attributes through time, then each time
                  						point in the data set will be represented
                  						by a two-dimensional flat n * m
                  						matrix, listing the state of the
                  						attributes at that point in time, but the
                  						overall data set will be three
                  						dimensional, where time is used to link
                  						the two-dimensional snapshots. In these
                  						contexts, the term tensor is
                  						sometimes used to generalize the
                  							matrix concept to higher
                  						dimensions.
               
            
                  											(2)
                  										
               
               This example is inspired by
                  						an example in Han, Kamber, and Pei
                  						2011.
               
            الفصل الثالث: النظام البيئي لعلم البيانات
                  											(1)
                  										
               
               See Storm website, at
                  								http://storm.apache.org.
               
            الفصل الرابع: أساسيات تعلُّم الآلة
                  											(1)
                  										
               
               This subheading,
                  							Correlations Are Not Causations, but
                  							Some Are Useful, is inspired by
                  							George E. P. Box’s (1979)
                  							observation, “Essentially, all models
                  							are wrong, but some are
                  							useful.”
               
            
                  											(2)
                  										
               
               For a numeric target,
                  							the average is the most common
                  							measure of central tendency, and for
                  							nominal or ordinal data the mode (or
                  							most frequently occurring value is
                  							the most common measure of central
                  							tendency).
               
            
                  											(3)
                  										
               
               We are using a more
                  							complex notation here involving
                  								ω0 and
                  								ω1 because
                  							a few paragraphs later we expand this
                  							function to include more than one
                  							input attribute, so the subscripted
                  							variables are useful notations when
                  							dealing with multiple
                  							inputs.
               
            
                  											(4)
                  										
               
               A note of caution: the
                  							numeric values reported here should
                  							be taken as illustrative only and not
                  							interpreted as definitive estimates
                  							of the relationship between BMI and
                  							likelihood of
                  							diabetes.
               
            
                  											(5)
                  										
               
               In general, neural
                  							networks work best when the inputs
                  							have similar ranges. If there are
                  							large differences in the ranges of
                  							input attributes, the attributes with
                  							the much larger values tend to
                  							dominate the processing of the
                  							network. To avoid this, it is best to
                  							normalize the input attributes so
                  							that they all have similar
                  							ranges.
               
            
                  											(6)
                  										
               
               For the sake of
                  							simplicity, we have not included the
                  							weights on the connections in figures
                  							14 and 15.
               
            
                  											(7)
                  										
               
               Technically, the
                  							backpropagation algorithm uses the
                  							chain rule from calculus to calculate
                  							the derivative of the error of the
                  							network with respect to each weight
                  							for each neuron in the network, but
                  							for this discussion we will pass over
                  							this distinction between the error
                  							and the derivative of the error for
                  							the sake of clarity in explaining the
                  							essential idea behind the
                  							backpropagation
                  							algorithm.
               
            
                  											(8)
                  										
               
               No agreed minimum number
                  							of hidden layers is required for a
                  							network to be considered “deep,” but
                  							some people would argue that even two
                  							layers are enough to be deep. Many
                  							deep networks have tens of layers,
                  							but some networks can have hundreds
                  							or even thousands of
                  							layers.
               
            
                  											(9)
                  										
               
               For an accessible
                  							introduction to RNNs and their
                  							natural-language processing, see
                  							Kelleher 2016.
               
            
                  											(10)
                  										
               
               Technically, the
                  							decrease in error estimates is known
                  							as the vanishing-gradient
                     								problem because the
                  							gradient over the error surface
                  							disappears as the algorithm works
                  							back through the
                  							network.
               
            
                  											(11)
                  										
               
               The algorithm also
                  							terminates on two corner cases: a
                  							branch ends up with no instances
                  							after the data set is split up, or
                  							all the input attributes have already
                  							been used at nodes between the root
                  							node and the branch. In both cases, a
                  							terminating node is added and is
                  							labeled with the majority value of
                  							the target attribute at the parent
                  							node of the
                  						branch.
               
            
                  											(12)
                  										
               
               For an introduction to
                  							entropy and its use in decision-tree
                  							algorithms, see Kelleher, Mac Namee,
                  							and D’Arcy 2015 on information-based
                  							learning.
               
            
                  											(13)
                  										
               
               See Burt 2017 for an
                  							introduction to the debate on the
                  							“right to
                  							explanation.”
               
            الفصل الخامس: مهام علم البيانات القياسية
                  											(1)
                  										
               
               A customer-churn case
                  							study in Kelleher, Mac Namee, and
                  							D’Arcy 2015 provides a longer
                  							discussion of the design of
                  							attributes in propensity
                  							models.
               
            الفصل السادس: الخصوصية والأخلاقيات
                  											(1)
                  										
               
               Behavioral targeting
                  							uses data from users’ online
                  							activities—sites visited, clicks
                  							made, time spent on a site, and so
                  							on—and predictive modeling to select
                  							the ads shown to the
                  							user.
               
            
                  											(2)
                  										
               
               The EU Privacy and
                  							Electronic Communications Directive
                  							(2002/58/EC).
               
            
                  											(3)
                  										
               
               For example, some
                  							expectant women explicitly tell
                  							retailers that they are pregnant by
                  							registering for promotional
                  							new-mother programs at the
                  							stores.
               
            
                  											(4)
                  										
               
               For more on PredPol, see
                  								http://www.predpol.com.
               
            
                  											(5)
                  										
               
               A Panopticon is an
                  							eighteenth-century design by Jeremy
                  							Bentham for institutional buildings,
                  							such as prisons and psychiatric
                  							hospitals. The defining
                  							characteristic of a Panopticon was
                  							that the staff could observe the
                  							inmates without the inmates’
                  							knowledge. The underlying idea of
                  							this design was that the inmates were
                  							forced to act as though they were
                  							being watched at all
                  							times.
               
            
                  											(6)
                  										
               
               As distinct from digital
                  							footprint.
               
            
                  											(7)
                  										
               
               Civil Rights Act of
                  							1964, Pub. L. 88-352, 78 Stat. 241,
                  							at
                  								https://www.gpo.gov/fdsys/pkg/STATUTE-78/pdf/STATUTE-78-Pg241.pdf.
               
            
                  											(8)
                  										
               
               Americans with
                  							Disabilities Act of 1990, Pub. L.
                  							101-336, 104 Stat. 327, at
                  								https://www.gpo.gov/fdsys/pkg/STATUTE-104/pdf/STATUTE-104-Pg327.pdf.
               
            
                  											(9)
                  										
               
               The Fair Information
                  							Practice Principles are available at
                  								https://www.dhs.gov/publication/fair-information-practice-principles-fipps.
               
            
                  											(10)
                  										
               
               Senate of California,
                  							SB-568 Privacy: Internet: Minors,
                  							Business and Professions Code,
                  							Relating to the Internet, vol.
                  							division 8, chap. 22.1 (commencing
                  							with sec. 22580) (2013), at
                  								https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=201320140SB568.
               
            الفصل السابع: التأثير المستقبلي لعلم البيانات ومبادئ النجاح
                  											(1)
                  										
               
               For more on the
                  							SmartSantander project in Spain, see
                  								http://smartsantander.eu.
               
            
                  											(2)
                  										
               
               For more on the TEPC’s
                  							projects, see
                  								http://www.tepco.co.jp/en/press/corp-com/release/2015/1254972_6844.html.
               
            
                  											(3)
                  										
               
               Leo Tolstoy’s book
                  								Anna
                     								Karenina (1877)
                  							begins: “All happy families are
                  							alike; each unhappy family is unhappy
                  							in its own way.” Tolstoy’s idea is
                  							that to be happy, a family must be
                  							successful in a range of areas (love,
                  							finance, health, in-laws), but
                  							failure in any of these areas will
                  							result in unhappiness. So all happy
                  							families are the same because they
                  							are successful in all areas, but
                  							unhappy families can be unhappy for
                  							many different combinations of
                  							reasons.