One of the more difficult to teach aspects of data science is communicating with other professionals, especially non-technical ones. Learning to recognize cognitive biases and to respond appropriately is a crucial skill.
Our first bias is so ubiquitous it gets its own section. Confirmation bias -- the tendency to favor information that confirms one's preexisting beliefs -- is rampant throughout data science and consulting. The bias can occur very early in the life of a contract or product: often one is hired or contracted specifically to provide an outside or new confirmation of the beliefs of one or more stakeholders. It can also occur once the final analysis is done, if the stakeholder does not want to believe your results when they contradict expectations.
You've almost certainly heard this idea before: people here what they want to hear: "people in general are twice as likely to select information that supported their own point of view as to consider an opposing idea". Similarly "you cannot reason someone out of something he or she was not reasoned into". For many preexisting beliefs trump evidence.
This is a challenging bias to overcome. Many people do not like to challenge their own beliefs and may react negatively when someone else challenges these beliefs. In fact, some become more sure in their beliefs when presented with contradictory evidence! (This is called the backfire effect and is closely related to belief revision).
Avoiding stakeholders primarily interested in confirmations of their beliefs is possible. Early in negotiations ask many questions as to the nature of the project, the main questions to be addressed, if there are any prevailing beliefs in the company or organziation and if there is any evidence that supports the existing hypotheses. Ask what will happen should you prove or disprove the hypothesis. Often people will simply tell you that they are looking for a confirmatory result.
This is also something that product managers and sales staff should look out for, especially if there is a large implemetation cost for bringing on a new client. Sometimes customers are just looking to prove a point internally and have no intention of long term use.
Closely-related: Status quo bias: the tendency to like things to stay relatively the same -- a possible source of confirmation bias. This is particularly irksome when it originates from a position of power. It's often said that "Science advances on funeral at a time": Max Planck famously remarked that "A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it." There is evidence of this phenomenon.
Watch out that you (as a data scientist) do not fall prey to expectation bias: the tendency for experimenters to believe, certify, and publish data that agree with their expectations for the outcome of an experiment, and to disbelieve, discard, or downgrade the corresponding weightings for data that appear to conflict with those expectation. As a data scientist you need to avoid confirmation bias in your workflows and thoughts. Insights should be data driven. Of course it's fine to use domain specific knowledge as a guide, but ultimately the data has to support the conclusion convincingly.
There are also "group-level" biases. As a data scientist or consultant you may experience two seemingly contradictory effects:
In both cases there is a flavor of confirmation bias at play -- either your outside/expert opinion "objectively" confirms an existing belief or your opinion is rejected because it does not conform to the group belief.
Often as the resident data scientist you will have a much better grasp on statistics than your primary stakeholders. Decisions are frequently based on very small amounts of data, and it's useful to be able to recognize statistical biases as the appear in the decision making process.
Insensitivity to Sample Size: People without training in statistics will often not understand the need to collect enough data for statistical significance. I've personally heard many statments along the lines of "it's only three data points but hopefully it is representative" (note the expectation bias as well). That's not how any of this works. The damage compounds with sampling bias, and it all goes back to a poor understanding of statistical experiment design.
On the other hand you may encounter the Illusion of validity: the belief that furtherly acquired information generates additional relevant data for predictions, even when it evidently does not. More data isn't always useful and many statistical models yield marginal returns once a large amount of data is already possessed. Sometimes clients and stakeholders want to collect more data rather than accept an undesirable conclusion.
Survivorship Bias: focusing on success stories or exceptional cases rather than the distribution of outcomes. When combined with insensitity to sample size this can be devastating. In a production process the average case and the variability should be the focus, not the random outlier that lasts twice as long or has an extreme value of some property.
Close to my home there are many street signs advertizing a local college, remarking it's accolades:
and many more. Presummably we're supposed to conclude that this is a good school because it ranks highly in many categories. But likely many schools could find a list of similar accolades by random chance if thousands of such computations were carried out. Of course the advertisements do not include any information on how many accolades the school dredged through, or how different this set of outcomes is compared to other institutions.
As a data scientist you have a lot of influence over the decision making process by providing crucial information to stakeholders. You also have many chances to observe collect decision making and the many biases that occur. Here's a sample.
Another one I've often encountered, especially at startups, is Optimism bias: the tendency to be over-optimistic, overestimating favorable and pleasing outcomes (see also wishful thinking). On the one hand it's important to stay positive at a startup since there are so many unknowns, but when the evidence says something isn't working, organizations need to adapt.
How should one handle any of these biases when encountered in the wild? There are a few useful techniques:
Sometimes there is simply nothing you can do, and things won't get better until the source of the problem (a misguided stakeholder, for example) leaves the organization.