Các bạn up ảnh vào đây để lấy link

Thứ Hai, 26 tháng 4, 2010

Chapter 003. Decision-Making in Clinical Medicine

Tham khảo bài viết bằng tiếng việt qua google 
Harrison's Internal Medicine > Chapter 3. Decision-Making in Clinical Medicine
Decision-Making in Clinical Medicine: IntroductionTo the medical student who requires 2 h to collect a patient's history and perform a physical examination, and several additional hours to organize them into a coherent presentation, the experienced clinician's ability to reach a diagnosis and decide on a management plan in a fraction of the time seems extraordinary. While medical knowledge and experience play a significant role in the senior clinician's ability to arrive at a differential diagnosis and plan quickly, much of the process involves skill in clinical decision-making. The first goal of this chapter is to provide an introduction to the study of clinical reasoning.Equally bewildering to the student are the proper use of diagnostic tests and the integration of the results into the clinical assessment. The novice medical practitioner typically uses a "shotgun" approach to testing, hoping to hit a target without knowing exactly what that target is. The expert, on the other hand, usually has a specific target in mind and efficiently adjusts the testing strategy to it. The second goal of this chapter is to review briefly some of the crucial basic statistical concepts that govern the proper interpretation and use of diagnostic tests. Quantitative tools available to assist in clinical decision-making will also be discussed.Evidence-based medicine is the term used to describe the integration of the best available research evidence with clinical judgment and experience in the care of patients. The third goal of this chapter is to provide a brief overview of some of the tools of evidence-based medicine.Clinical Decision-MakingClinical ReasoningThe most important clinical actions are not procedures or prescriptions but the judgments from which all other aspects of clinical medicine flow. In the modern era of large randomized trials and evidence-based medicine, it is easy to overlook the importance of this elusive mental activity and focus instead on the algorithmic practice guidelines constructed to improve care. One reason for this apparent neglect is that much more research has been done on how doctors should make decisions (e.g., using a Bayesian model, discussed below) than on how they actually do. Thus, much of what we know about clinical reasoning comes from empirical studies of nonmedical problem-solving behavior.Despite the great technological advances of medicine over the last century, uncertainty still plays a pivotal role in all aspects of medical decision-making. We may know that a patient does not have long to live, but we cannot be certain how long. We may prescribe a potent new receptor blocker to reverse the course of a patient's illness, but we cannot be certain that the therapy will achieve the desired result and that result alone. Uncertainty in medical outcomes creates the need for probabilities and other mathematical/statistical tools to help guide decision-making. (These tools are reviewed later in the chapter.)Uncertainty is compounded by the information overload that characterizes modern medicine. Today's experienced clinician needs close to 2 million pieces of information to practice medicine. Doctors subscribe to an average of seven journals, representing over 2500 new articles each year. Computers offer the obvious solution both for management of information and for better quantitation and management of the daily uncertainties of medical care. While the technology to computerize medical practice is available, many practical problems remain to be solved before patient information can be standardized and integrated with medical evidence on a single electronic platform.The following three examples introduce the subject of clinical reasoning:
  • A 46-year-old man presents to his internist with a chief complaint of hemoptysis. The physician knows that the differential diagnosis of hemoptysis includes over 100 different conditions, including cancer and tuberculosis. The examination begins with some general background questions, and the patient is asked to describe his symptoms and their chronology. By the time the examination is completed, and even before any tests are run, the physician has formulated a working diagnostic hypothesis and planned a series of steps to test it. In an otherwise healthy and nonsmoking patient recovering from a viral bronchitis, the doctor's hypothesis would be that the acute bronchitis is responsible for the small amount of blood-streaked sputum the patient observed. In this case, a chest x-ray may provide sufficient reassurance that a more serious disorder is not present.
  • A second 46-year-old patient with the same chief complaint who has a 100-pack-year smoking history, a productive morning cough, and episodes of blood-streaked sputum may generate the principal diagnostic hypothesis of carcinoma of the lung. Consequently, along with the chest x-ray, the physician obtains a sputum cytology examination and refers this patient for fiberoptic bronchoscopy.
  • A third 46-year-old patient with hemoptysis who is from a developing country is evaluated with an echocardiogram as well, because the physician thinks she hears a soft diastolic rumbling murmur at the apex on cardiac auscultation, suggesting rheumatic mitral stenosis.
These three simple vignettes illustrate two aspects of expert clinical reasoning: (1) the use of cognitive shortcuts as a way to organize the complex unstructured material that is collected in the clinical evaluation, and (2) the use of diagnostic hypotheses to consolidate the information and indicate appropriate management steps.The Use of Cognitive ShortcutsCognitive shortcuts or rules of thumb, sometimes referred to as heuristics, can help solve complex problems, of the sort encountered daily in clinical medicine, with great efficiency. Clinicians rely on three basic types of heuristics. When assessing a particular patient, clinicians often weigh the probability that this patient's clinical features match those of the class of patients with the leading diagnostic hypotheses being considered. In other words, the clinician is searching for the diagnosis for which the patient appears to be a representative example; this cognitive shortcut is called the representativeness heuristic.It may take only a few characteristics from the history for an expert clinician using the representativeness heuristic to arrive at a sound diagnostic hypothesis. For example, an elderly patient with new-onset fever, cough productive of copious sputum, unilateral pleuritic chest pain, and dyspnea is readily identified as fitting the pattern for acute pneumonia, probably of bacterial origin. Evidence of focal pulmonary consolidation on the physical examination will increase the clinician's confidence in the diagnosis because it fits the expected pattern of acute bacterial pneumonia. Knowing this allows the experienced clinician to conduct an efficient, directed, and therapeutically productive patient evaluation since there may be little else in the history or physical examination of direct relevance. The inexperienced medical student or resident, who has not yet learned the patterns most prevalent in clinical medicine, must work much harder to achieve the same result and is often at risk of missing the important clinical problem in a sea of compulsively collected but unhelpful data.However, physicians using the representativeness heuristic can reach erroneous conclusions if they fail to consider the underlying prevalence of two competing diagnoses (i.e., the prior, or pretest, probabilities). Consider a patient with pleuritic chest pain, dyspnea, and a low-grade fever. A clinician might consider acute pneumonia and acute pulmonary embolism to be the two leading diagnostic alternatives. Using the representativeness heuristic, the clinician might judge both diagnostic candidates to be equally likely, although to do so would be wrong if pneumonia was much more prevalent in the underlying population. Mistakes may also result from a failure to consider that a pattern based on a small number of prior observations will likely be less reliable than one based on larger samples.A second commonly used cognitive shortcut, the availability heuristic, involves judgments made on the basis of how easily prior similar cases or outcomes can be brought to mind. For example, the experienced clinician may recall 20 elderly patients seen over the past few years who presented with painless dyspnea of acute onset and were found to have acute myocardial infarction. The novice clinician may spend valuable time seeking a pulmonary cause for the symptoms before considering and then confirming the cardiac diagnosis. In this situation, the patient's clinical pattern does not fit the expected pattern of acute myocardial infarction, but experience with this atypical presentation, and the ability to recall it, can help direct the physician to the diagnosis.Errors with the availability heuristic can come from several sources of recall bias. For example, rare catastrophes are likely to be remembered with a clarity and force out of proportion to their value, and recent experience is, of course, easier to recall and therefore more influential on clinical judgments.The third commonly used cognitive shortcut, the anchoring heuristic, involves estimating a probability by starting from a familiar point (the anchor) and adjusting to the new case from there. Anchoring can be a powerful tool for diagnosis but is often used incorrectly. For example, a clinician may judge the probability of coronary artery disease (CAD) to be very high after a positive exercise thallium test, because the prediction has been anchored to the test result ("positive test = high probability of CAD"). Yet, as discussed below, this prediction would be inaccurate if the clinical (pretest) picture of the patient being tested indicates a low probability of disease (e.g., a 30-year-old woman with no risk factors). As illustrated in this example, anchors are not necessarily the same as the pretest probability (see "Measures of Disease Probability and Bayes' Theorem," below).Diagnostic Hypothesis GenerationCognitive scientists studying the thought processes of expert clinicians have observed that clinicians group data into packets, or "chunks," which are stored in their memories and manipulated to generate diagnostic hypotheses. Because short-term memory can typically hold only 7–10 items at a time, the number of packets that can be actively integrated into hypothesis-generating activities is similarly limited. The cognitive shortcuts discussed above play a key role in the generation of diagnostic hypotheses, many of which are discarded as rapidly as they are formed.A diagnostic hypothesis sets a context for diagnostic steps to follow and provides testable predictions. For example, if the enlarged and quite tender liver felt on physical examination is due to acute hepatitis (the hypothesis), certain specific liver function tests should be markedly elevated (the prediction). If the tests come back normal, the hypothesis may need to be discarded or substantially modified.One of the factors that make teaching diagnostic reasoning difficult is that expert clinicians do not follow a fixed pattern in patient examinations. From the outset, they are generating, refining, and discarding diagnostic hypotheses. The questions they ask in the history are driven by the hypotheses they are working with at the moment. Even the physical examination is driven by specific questions rather than a preordained checklist. While the student is palpating the abdomen of the alcoholic patient, waiting for a finding to strike him, the expert clinician is on a focused search mission. Is the spleen enlarged? How big is the liver? Is it tender? Are there any palpable masses or nodules? Each question focuses the attention of the examiner to the exclusion of all other inputs until answered, allowing the examiner to move on to the next specific question.Negative findings are often as important as positive ones in establishing and refining diagnostic hypotheses. Chest discomfort that is not provoked or worsened by exertion in an active patient reduces the likelihood that chronic ischemic heart disease is the underlying cause. The absence of a resting tachycardia and thyroid gland enlargement reduces the likelihood of hyperthyroidism in a patient with paroxysmal atrial fibrillation.The acuity of a patient's illness can play an important role in overriding considerations of prevalence and other issues described above. For example, clinicians are taught to consider aortic dissection routinely as a possible cause of acute severe chest discomfort along with myocardial infarction, even though the typical history of dissection is different from myocardial infarction and dissection is far less prevalent (Chap. 242). This recommendation is based on the recognition that a relatively rare but catastrophic diagnosis like aortic dissection is very difficult to make unless it is explicitly considered. If the clinician fails to elicit any of the characteristic features of dissection by history and finds equivalent blood pressures in both arms and no pulse deficits, he or she may feel comfortable in discarding the aortic dissection hypothesis. If, however, the chest x-ray shows a widened mediastinum, the hypothesis may be reinstated and a diagnostic test ordered [e.g., thoracic computed tomography (CT) scan, transesophageal echocardiogram] to evaluate it more fully. In nonacute situations, the prevalence of potential alternative diagnoses should play a much more prominent role in diagnostic hypothesis generation.Generation of Diagnostic HypothesesBecause the generation and evaluation of appropriate diagnostic hypotheses is a skill that not all clinicians possess to an equal degree, errors in this process can occur; in the patient with serious acute illness, these may lead to tragic consequences. Consider the following hypothetical example. A 45-year-old male patient with a 3-week history of a "flulike" upper respiratory infection (URI) presented to his physician with symptoms of dyspnea and a productive cough. Based on the presenting complaint, the clinician pulled out a "URI Assessment Form" to improve quality and efficiency of care. The physician quickly completed the examination components outlined on this structured form, noting in particular the absence of fever and a clear chest examination. He then prescribed an antibiotic for presumed bronchitis, showed the patient how to breathe into a paper bag to relieve his "hyperventilation," and sent him home with the reassurance that his illness was not serious. After a sleepless night with significant dyspnea unrelieved by breathing into a bag, the patient developed nausea and vomiting and collapsed. He was brought into the Emergency Department in cardiac arrest and could not be resuscitated. Autopsy showed a posterior wall myocardial infarction and a fresh thrombus in an atherosclerotic right coronary artery. What went wrong? The clinician decided, even before starting the history, that the patient's complaints were not serious. He therefore felt confident that he could perform an abbreviated and focused examination using the URI assessment protocol rather than considering the full range of possibilities and performing appropriate tests to confirm or refute his initial hypotheses. In particular, by concentrating on the "URI," the clinician failed to elicit the full dyspnea history, which would have suggested a far more serious disorder, and neglected to search for other symptoms that could have directed him to the correct diagnosis.This example illustrates how patients can diverge from textbook symptoms and the potential consequences of being unable to adapt the diagnostic process to real-world challenges. The expert, while recognizing that common things occur commonly, approaches each evaluation on high alert for clues that the initial diagnosis may be wrong. Patients often provide information that "does not fit" with any of the leading diagnostic hypotheses being considered. Distinguishing real clues from false trails can only be achieved by practice and experience. A less-experienced clinician who tries to be too efficient (as in the above example) can make serious errors. Use of a rapid systematic clinical survey of symptoms and organ systems can help prevent the clinician from overlooking important but inapparent clues.Major Influences on Clinical Decision-MakingMore than a decade of research on variations in clinician practice patterns has shed much light on forces that shape clinical decisions. The use of heuristic "shortcuts," as detailed above, provides a partial explanation, but several other key factors play an important role in shaping diagnostic hypotheses and management decisions. These factors can be grouped conceptually into three overlapping categories: (1) factors related to physicians' personal characteristics and practice style, (2) factors related to the practice setting, and (3) factors related to economic incentives.Factors Related to Practice StyleOne of the key roles of the physician in medical care is to serve as the patient's agent to ensure that necessary care is provided at a high level of quality. Factors that influence this role include the physician's knowledge, training, and experience. It is obvious that physicians cannot practice evidence-based medicine (EBM; described later in the chapter) if they are unfamiliar with the evidence. As would be expected, specialists generally know the evidence in their field better than do generalists. Surgeons may be more enthusiastic about recommending surgery than medical doctors because their belief in the beneficial effects of surgery is stronger. For the same reason, invasive cardiologists are much more likely to refer chest pain patients for diagnostic catheterization than are noninvasive cardiologists or generalists. The physician beliefs that drive these different practice styles are based on personal experience, recollection, and interpretation of the available medical evidence. For example, heart failure specialists are much more likely than generalists to achieve target angiotensin-converting enzyme (ACE) inhibitor therapy in their heart failure patients because they are more familiar with what the targets are (as defined by large clinical trials), have more familiarity with the specific drugs (including dosages and side effects), and are less likely to overreact to foreseeable problems in therapy such as a rise in creatinine levels or symptomatic hypotension. Other intriguing research has shown a wide distribution of acceptance times of antibiotic therapy for peptic ulcer disease following widespread dissemination of the "evidence" in the medical literature. Some gastroenterologists accepted this new therapy before the evidence was clear (reflecting, perhaps, an aggressive practice style), and some gastroenterologists lagged behind (a conservative practice style, associated in this case with older physicians). As a group, internists lagged several years behind gastroenterologists.The opinion of influential leaders can also have an important effect on practice patterns. Such influence can occur at both the national level (e.g., expert physicians teaching at national meetings) and the local level (e.g., local educational programs, "curbside consultations"). Opinion leaders do not have to be physicians. When conducting rounds with clinical pharmacists, physicians are less likely to make medication errors and more likely to use target levels of evidence-based therapies.The patient's welfare is not the only concern that drives clinical decisions. The physician's perception about the risk of a malpractice suit resulting from either an erroneous decision or a bad outcome creates a style of practice referred to as defensive medicine. This practice involves using tests and therapies with very small marginal returns to preclude future criticism in the event of an adverse outcome. For example, a 40-year-old woman who presents with a long-standing history of intermittent headache and a new severe headache along with a normal neurologic examination has a very low likelihood of structural intracranial pathology. Performance of a head CT or magnetic resonance imaging (MRI) scan in this situation would constitute defensive medicine. On the other hand, the results of the test could provide reassurance to an anxious patient.Practice Setting FactorsFactors in this category relate to the physical resources available to the physician's practice and the practice environment. Physician-induced demand is a term that refers to the repeated observation that physicians have a remarkable ability to accommodate to and employ the medical facilities available to them. One of the foundational studies in outcomes research showed that physicians in Boston had an almost 50% higher hospital admission rate than did physicians in New Haven, despite there being no obvious differences in the health of the cities' inhabitants. The physicians in New Haven were not aware of using fewer hospital beds for their patients, nor were the Boston physicians aware of using less stringent criteria to admit patients. In both cities, physicians unconsciously adopted their practice styles to the available level of hospital beds.Other environmental factors that can influence decision-making include the local availability of specialists for consultations and procedures, "high tech" facilities such as angiography suites, a heart surgery program, and MRI machines.Economic IncentivesEconomic incentives are closely related to the other two categories of practice-modifying factors. Financial issues can exert both stimulatory and inhibitory influences on clinical practice. In general, physicians are paid on a fee-for-service, capitation, or salary basis. In fee-for-service, the more the physician does, the more the physician gets paid. The economic incentive in this case is to do more. When fees are reduced (discounted fee-for-service), doctors tend to increase the number of services billed for. Capitation, in contrast, provides a fixed payment per patient per year, encouraging physicians to take on more patients but to provide each patient with fewer services. Expensive services are more likely to be affected by this type of incentive than inexpensive preventive services. Salary compensation plans pay physicians the same regardless of the amount of clinical work performed. The incentive here is to see fewer patients.In summary, expert clinical decision-making can be appreciated as a complex interplay between cognitive devices used to simplify large amounts of complex information interacting with physician biases reflecting education, training, and experience, all of which are shaped by powerful, sometimes perverse, external forces. In the next section, a set of statistical tools and concepts that can assist in making clinical decisions in the presence of uncertainty are reviewed.Quantitative Methods to Aid Clinical Decision-MakingThe process of medical decision-making can be divided into two parts: (1) defining the available courses of action and estimating the likely outcomes with each, and (2) assessing the desirability of the outcomes. The former task involves integrating key information about the patient along with relevant evidence from the medical literature to create the structure of a decision. The remainder of this chapter will review some quantitative tools available to assist the clinician in these activities.Quantitative Medical PredictionsDiagnostic Testing: Measures of Test AccuracyThe purpose of performing a test on a patient is to reduce uncertainty about the patient's diagnosis or prognosis and to aid the clinician in making management decisions. Although diagnostic tests are commonly thought of as laboratory tests (e.g., measurement of serum amylase level) or procedures (e.g., colonoscopy or bronchoscopy), any technology that changes our understanding of the patient's problem qualifies as a diagnostic test. Thus, even the history and physical examination can be considered a form of diagnostic test. In clinical medicine, it is common to reduce the results of a test to a dichotomous outcome, such as positive or negative, normal or abnormal. In many cases, this simplification results in the waste of useful information. However, such simplification makes it easier to demonstrate some of the quantitative ways in which test data can be used.The accuracy of diagnostic tests is defined in relation to an accepted "gold standard," which is presumed to reflect the true state of the patient (Table 3-1). To define the diagnostic performance of a new test, an appropriate population must be identified (ideally patients in whom the new test would be used) and both the new and the gold standard tests are applied to all subjects. The results of the two tests are then compared. The sensitivity or true-positive rate of the new test is the proportion of patients with disease (defined by the gold standard) who have a positive (new) test. This measure reflects how well the test identifies patients with disease. The proportion of patients with disease who have a negative test is the false-negative rate and is calculated as 1 – sensitivity. The proportion of patients without disease who have a negative test is the specificity or true-negative rate. This measure reflects how well the test correctly identifies patients without disease. The proportion of patients without disease who have a positive test is the false-positive rate, calculated as 1 – specificity. A perfect test would have a sensitivity of 100% and a specificity of 100% and would completely separate patients with disease from those without it.
Table 3-1 Measures of Diagnostic Test Accuracy
Disease Status
Test Result Present Absent
PositiveTrue-positive (TPFalse-positive (FP
NegativeFalse-negative (FNTrue-negative (TN
Identification of Patients with Disease
True-positive rate (sensitivity) = TP/(TP+FN
False-negative rate = FN/(TP+FN
True-positive rate = 1 – false-negative rate
Identification of Patients without Disease
True-negative rate (specificity) = TN/(TN+FP
False-positive rate = FP/(TN+FP
True-negative rate = 1 – false-positive rate
Calculating sensitivity and specificity requires selection of a decision value for the test to define the threshold value at or above which the test is considered "positive." For any given test, as this cut point is moved to improve sensitivity, specificity typically falls and vice versa. This dynamic tradeoff between more accurate identification of subjects with disease versus those without disease is often displayed graphically as a receiver operating characteristic (ROC) curve (Fig. 3-1). An ROC curve plots sensitivity (y-axis) versus 1 – specificity (x-axis). Each point on the curve represents a potential cut point with an associated sensitivity and specificity value. The area under the ROC curve is often used as a quantitative measure of the information content of a test. Values range from 0.5 (no diagnostic information at all, test is equivalent to flipping a coin) to 1.0 (perfect test).In the testing literature, ROC areas are often used to compare alternative tests that can be used for a particular diagnostic problem (Fig. 3-1). The test with the highest area (i.e., closest to 1.0) is presumed to be the most accurate. However, ROC curves are not a panacea for evaluation of diagnostic test utility. Like Bayes' theorem (discussed below), they are typically focused on only one possible test parameter (e.g., ST-segment response in a treadmill exercise test) to the exclusion of other potentially relevant data. In addition, ROC area comparisons do not simulate the way test information is actually used in clinical practice. Finally, biases in the underlying population used to generate the ROC curves (e.g., related to an unrepresentative test sample) can bias the ROC area and the validity of a comparison among tests.Measures of Disease Probability and Bayes' TheoremUnfortunately, there are no perfect tests; after every test is completed, the true disease state of the patient remains uncertain. Quantitating this residual uncertainty can be done with Bayes' theorem. This theorem provides a simple mathematical way to calculate the posttest probability of disease from three parameters: the pretest probability of disease, the test sensitivity, and the test specificity (Table 3-2). The pretest probability is a quantitative expression of the confidence in a diagnosis before the test is performed. In the absence of more relevant information, it is usually estimated from the prevalence of the disease in the underlying population. For some common conditions, such as coronary artery disease (CAD), nomograms and statistical models have been created to generate better estimates of pretest probability from elements of the history and physical examination. The posttest probability, then, is a revised statement of the confidence in the diagnosis, taking into account what was known both before and after the test.
Table 3-2 Measures of Disease Probability
Pretest probability of disease = probability of disease before test is done.
  May use population prevalence of disease or more patient-specific data to generate this probability estimate.
Posttest probability of disease = probability of disease accounting for both pretest probability and test results. Also called predictive value of the test.
Bayes' theorem: Computational version:

Example [with a pretest probability of 0.50 and a "positive" diagnostic test result (test sensitivity = 0.90, test specificity = 0.90)]:
The term predictive value is often used as a synonym for the posttest probability. Unfortunately, clinicians commonly misinterpret reported predictive values as intrinsic measures of test accuracy. Studies of diagnostic tests compound the confusion by calculating predictive values on the same sample used to measure sensitivity and specificity. Since all posttest probabilities are a function of the prevalence of disease in the tested population, such calculations are clinically irrelevant unless the test is subsequently applied to populations with the same disease prevalence. For these reasons, the term predictive value is best avoided in favor of the more informative posttest probability.To understand conceptually how Bayes' theorem estimates the posttest probability of disease, it is useful to examine a nomogram version of Bayes' theorem (Fig. 3-2). In this nomogram, the accuracy of the diagnostic test in question is summarized by the likelihood ratio , which is defined as the ratio of the probability of a given test result (e.g., "positive" or "negative") in a patient with disease to the probability of that result in a patient without disease.For a positive test, the likelihood ratio is calculated as the ratio of the true-positive rate to the false-positive rate [or sensitivity/(1 – specificity)]. For example, a test with a sensitivity of 0.90 and a specificity of 0.90 has a likelihood ratio of 0.90/(1 – 0.90), or 9. Thus, for this hypothetical test, a "positive" result is 9 times more likely in a patient with the disease than in a patient without it. Most tests in medicine have likelihood ratios for a positive result between 1.5 and 20. Higher values are associated with tests that are more accurate at identifying patients with disease, with values of 10 or greater of particular note. If sensitivity is excellent but specificity is less so, the likelihood ratio will be substantially reduced (e.g., with a 90% sensitivity but a 60% specificity, the likelihood ratio is 2.25). For a negative test, the corresponding likelihood ratio is the ratio of the false negative rate to the true negative rate [or (1 – sensitivity)/specificity]. The smaller the likelihood ratio (i.e., closer to 0) the better the test performs at ruling out disease. The hypothetical test we considered above with a sensitivity of 0.9 and a specificity of 0.9 would have a likelihood ratio for a negative test result of (1 – 0.9)/0.9 of 0.11, meaning that a negative result is almost 10 times more likely if the patient is disease-free than if he has disease.Applications to Diagnostic Testing in CADConsider two tests commonly used in the diagnosis of CAD, an exercise treadmill and an exercise single photon emission CT (SPECT) myocardial perfusion imaging test (Chap. 222). Meta-analysis has shown a positive treadmill ST-segment response to have an average sensitivity of 66% and an average specificity of 84%, yielding a likelihood ratio of 4.1 [0.66/(1 – 0.84)]. If we use this test on a patient with a pretest probability of CAD of 10%, the posttest probability of disease following a positive result rises to only about 30%. If a patient with a pretest probability of CAD of 80% has a positive test result, the posttest probability of disease is about 95%.The exercise SPECT myocardial perfusion test is a more accurate test for the diagnosis of CAD. For our purposes, assume that the finding of a reversible exercise-induced perfusion defect has both a sensitivity and specificity of 90%, yielding a likelihood ratio for a positive test of 9.0 [0.90/(1 – 0.90)]. If we again test our low pretest probability patient and he has a positive test, using Fig. 3-2 we can demonstrate that the posttest probability of CAD rises from 10 to 50%. However, from a decision-making point of view, the more accurate test has not been able to improve diagnostic confidence enough to change management. In fact, the test has moved us from being fairly certain that the patient did not have CAD to being completely undecided (a 50:50 chance of disease). In a patient with a pretest probability of 80%, using the more accurate exercise SPECT test raises the posttest probability to 97% (compared with 95% for the exercise treadmill). Again, the more accurate test does not provide enough improvement in posttest confidence to alter management, and neither test has improved much upon what was known from clinical data alone.If the pretest probability is low (e.g., 20%), even a positive result on a very accurate test will not move the posttest probability to a range high enough to rule in disease (e.g., 80%). Conversely, with a high pretest probability, a negative test will not adequately rule out disease. Thus, the largest gain in diagnostic confidence from a test occurs when the clinician is most uncertain before performing it (e.g., pretest probability between 30 and 70%). For example, if a patient has a pretest probability for CAD of 50%, a positive exercise treadmill test will move the posttest probability to 80% and a positive exercise SPECT perfusion test will move it to 90% (Fig. 3-2).Bayes' theorem, as presented above, employs a number of important simplifications that should be considered. First, few tests have only two useful outcomes, positive or negative, and many tests provide numerous pieces of data about the patient. Even if these can be integrated into a summary result, multiple levels of useful information may be present (e.g., strongly positive, positive, indeterminate, negative, strongly negative). While Bayes' theorem can be adapted to this more detailed test result format, it is computationally complex to do so. Finally, it has long been asserted that sensitivity and specificity are prevalence-independent parameters of test accuracy, and many texts still make this statement. This statistically useful assumption, however, is clinically simplistic. A treadmill exercise test, for example, has a sensitivity in a population of patients with one-vessel CAD of around 30%, whereas its sensitivity in severe three-vessel CAD approaches 80%. Thus, the best estimate of sensitivity to use in a particular decision will often vary, depending on the distribution of disease stages present in the tested population. A hospitalized population typically has a higher prevalence of disease and in particular a higher prevalence of more advanced disease than an outpatient population. As a consequence, test sensitivity will tend to be higher in hospitalized patients, whereas test specificity will be higher in outpatients.Statistical Prediction ModelsBayes' theorem, as presented above, deals with a clinical prediction problem that is unrealistically simple relative to most problems a clinician faces. Prediction models, based on multivariable statistical models, can handle much more complex problems and substantially enhance predictive accuracy for specific situations. Their particular advantage is the ability to take into account many overlapping pieces of information and assign a relative weight to each based on its unique contribution to the prediction in question. For example, a logistic regression model to predict the probability of CAD takes into account all of the relevant independent factors from the clinical examination and diagnostic testing instead of the small handful of data that clinicians can manage in their heads or with Bayes' theorem. However, despite this strength, the models are too complex computationally to use without a calculator or computer (although this limit may be overcome once medicine is practiced from a fully computerized platform).To date, only a handful of prediction models have been properly validated. The importance of independent validation in a population separate from the one used to develop the model cannot be overstated. An unvalidated prediction model should be viewed with the same skepticism appropriate for a new drug or medical device that has not been through rigorous clinical trial testing.When statistical models have been compared directly with expert clinicians, they have been found to be more consistent, as would be expected, but not significantly more accurate. Their biggest promise, then, would seem to be to make less-experienced clinicians more accurate predictors of outcome.Decision Support ToolsDecision Support SystemsOver the past 35 years, many attempts have been made to develop computer systems to help clinicians make decisions and manage patients. Conceptually, computers offer a very attractive way to handle the vast information load that today's physicians face. The computer can help by making accurate predictions of outcome, simulating the whole decision process, or providing algorithmic guidance. Computer-based predictions using Bayesian or statistical regression models inform a clinical decision but do not actually reach a "conclusion" or "recommendation." Artificial intelligence systems attempt to simulate or replace human reasoning with a computer-based analogue. To date, such approaches have achieved only limited success. Reminder or protocol-directed systems do not make predictions but use existing algorithms, such as practice guidelines, to guide clinical practice. In general, however, decision support systems have shown little impact on practice. Reminder systems, although not yet in widespread use, have shown the most promise, particularly in correcting drug dosing and in promoting adherence to guidelines. The full impact of these approaches will only be evaluable when computers are fully integrated into medical practice.Decision AnalysisCompared with the methods discussed above, decision analysis represents a completely different approach to decision support. Its principal application is in decision problems that are complex and involve a substantial risk, a high degree of uncertainty in some key area, or an idiosyncratic feature that does not "fit" the available evidence. Five general steps are involved. First, the decision problem must be clearly defined. Second, the elements of the decision must be made explicit. This involves specifying the alternatives being considered, their relevant outcomes, the probabilities attached to each outcome, and the relative desirability (called "utility") of each outcome. Cost can also be assigned to each branch of the decision tree, allowing calculation of cost effectiveness. Typically, the data to populate a decision model are derived from the literature, from unpublished sources, from expert opinion and from other secondary sources. Third, the decision model must be "evaluated" to determine the net long-term health benefits and costs of each strategy being considered. Fourth, the incremental health benefits and costs of the more effective strategies must be calculated. Finally, extensive sensitivity analyses must be used to examine the effects on the results of varying the starting assumptions through plausible alternative values.An example decision tree created to evaluate strategies for screening for human immunodeficiency virus (HIV) infection is shown in Fig. 3-3. Up to 20,000 new cases of HIV infection are believed to be caused each year in the United States by infected individuals who are unaware of their illness. In addition, about 40% of HIV-positive patients progress to AIDS within a year of their diagnosis. Early identification offers the opportunity both to prevent progression to AIDS through use of serial CD4 counts and measurements of viral load linked to selective use of combination antiretroviral therapy and to encourage reduction of risky sexual behavior.The Centers for Disease Control and Prevention (CDC) proposed in 2003 that routine HIV testing should be a part of standard medical care. In a decision-model exploration of this proposed strategy compared with usual care, assuming a 1% prevalence of unidentified HIV infection in the population, routine screening of a cohort of 43-year-old men and women increased life expectancy by 5.5 days and cost $194 per subject screened. The cost-effectiveness ratio for screening relative to usual care was $15,078 per quality-adjusted life year. Results were sensitive to assumptions about the effectiveness of behavior modification on subsequent sexual behavior, the benefits of early therapy of HIV infection and the prevalence and incidence of HIV infection in the population targeted. This model, which required over 75 separate data points, provides novel insights into a clinical management problem that has not been subjected to a randomized clinical trial.The process of building and evaluating decision models is generally too complex for use in real-time clinical management. The potential for this tool therefore lies in the development of a set of published models addressing a particular decision or policy area that can serve to highlight key pressure points in the problem. Although many published models tend to focus excessively on providing an "answer," their better role is to enhance understanding of the most important questions that deserve particular attention in clinical decision-makingEvidence-Based MedicineThe "art of medicine" is traditionally defined as a practice combining medical knowledge (including scientific evidence), intuition, and judgment in the care of patients (Chap. 1). Evidence-based medicine (EBM) updates this construct by placing a much-greater emphasis on the processes by which the clinician gains knowledge of the most up-to-date and relevant clinical research. The key processes of EBM can be summarized in four steps:
  1. Formulating the management question to be answered
  2. Searching the literature and on-line databases for applicable research data
  3. Appraising the evidence gathered with regard to its validity and relevance
  4. Integrating this appraisal with knowledge about the unique aspects of the patient (including preferences)
Steps 2 and 3 are the heart of EBM as it is currently used in practice. The process of searching the world's research literature and appraising the quality and relevance of studies thus identified can be quite time-consuming and requires skills and training that most clinicians do not possess. Thus, the best starting point for most EBM searches is the identification of recent systematic overviews of the problem in question (Table 3-3).
Table 3-3 Selected Tools for Finding the Evidence in Evidence-Based Medicine
NameDescriptionWeb AddressAvailability
Evidence-Based Medicine ReviewsComprehensive electronic database that combines and integrates:1. The Cochrane Database of Systematic Reviews2. ACP Journal Club3. The Database of Abstracts of Reviews of Effectivenesshttp://www.ovid.com Subscription required; available through medical center libraries and other institutions
Cochrane LibraryCollection of EBM databases including The Cochrane Database of Systematic Reviews—full text articles reviewing specific health care topicshttp://www.cochrane.org Subscription required; abstracts of systematic reviews available free online; some countries have funding to provide free access to all residents
ACP Journal ClubCollection of summaries of original studies and systematic reviews; published bimonthly; all data since 1991 available on Web site, updated yearlyhttp://www.acpjc.org Subscription required
Clinical EvidenceMonthly updated directory of concise overviews of common clinical interventionshttp://www.clinicalevidence.com Subscription required; free access for UK and for developing countries
MEDLINE National Library of Medicine database with citations back to 1966http://www.nlm.nih.gov Free via Internet
Note: ACP, American College of Physicians; EBM, evidence-based medicine.
Generally, the EBM tools listed in Table 3-3 provide access to research information in one of two forms. The first, primary research reports, is the original peer-reviewed research work that is published in medical journals. Initial access to this information in an EBM search may be gained through MEDLINE, which provides access to a huge amount of data in abstract form. However, it is often difficult, using MEDLINE, to locate reports that are on point in a sea of irrelevant or unhelpful information and being reasonably certain that important reports have not been overlooked. The second form, systematic reviews, comprehensively summarizes the available evidence on a particular topic up to a certain date and provides the interpretation of the reviewer. Explicit criteria are used to find all the relevant scientific research and grade its quality. The prototype for this kind of resource is the Cochrane Database of Systematic Reviews. One of the key components of a systematic review is a meta-analysis. In the next two sections, we will review some of the major types of clinical research reports available in the literature and the process of aggregating those data into meta-analyses.Sources of Evidence: Clinical Trials and RegistriesThe notion of learning from observation of patients is as old as medicine itself. Over the last 50 years, our understanding of how best to turn raw observation into useful evidence has evolved considerably. Case reports, personal anecdotal experience, and small single-center case series are now recognized as having severe limitations to validity and have no role in formulating modern standards of practice. The major tools used to develop reliable evidence consist of the randomized clinical trial and the large observational registry. A registry or database is typically focused on a disease or syndrome (e.g., cancer, CAD, heart failure), a clinical procedure (e.g., bone marrow transplantation, coronary revascularization), or an administrative process (e.g., claims data used for billing and reimbursement).By definition, in observational data the care of the patient is not controlled by the investigator. Carefully collected prospective observational data can achieve a level of quality approaching that of major clinical trial data. At the other end of the spectrum, data collected retrospectively (e.g., chart review) are limited in form and content to what previous observers thought was important to record, which may not serve the research question under study particularly well. Data not specifically collected for research (e.g., claims data) will often have important limitations that cannot be overcome in the analysis phase of the research. Advantages to observational data include the ability to capture a broader population than is typically represented in clinical trials. In addition, observational data are the primary source of evidence for questions where a randomized trial cannot or will not be performed. For example, it may be difficult to randomize patients to test diagnostic or therapeutic strategies that are unproven but widely accepted in practice. In addition, we cannot randomize patients to a gender, racial/ethnic group, socioeconomic status, or country of residence. We are also not willing to randomize patients to a potentially harmful intervention, such as smoking or overeating to develop obesity.The major difference between a well-done clinical trial and a well-done prospective observational study of a particular management strategy is the lack of protection from treatment selection bias in the latter. The underlying concept in the use of observational data to compare diagnostic or therapeutic strategies is that there is enough uncertainty in practice that similar patients will be managed differently by different physicians. In short, the assumption is that there is an element of randomness (in the sense of disorder rather than in the formal statistical sense) to clinical management. In such cases, statistical models can be used to adjust for important imbalances and "level the playing field" so that a fair comparison among treatment options can be made. When management is clearly not random (e.g., all eligible left main coronary artery disease patients are referred for coronary bypass surgery), the problem may be too confounded for statistical correction, and observational data may not provide reliable evidence.In general, use of concurrent controls is vastly preferable to historical controls. For example, comparison of current surgical management of left main coronary artery disease with left main patients treated medically during the 1970s (the last time these patients were routinely treated with medicine alone) would be extremely misleading since the quality of "medical therapy" has made huge improvements in the interval.Randomized controlled clinical trials include the careful prospective design features of the best observational data studies but also include the use of random allocation of treatment. This design provides the best protection against confounding due to treatment selection bias (a major aspect of internal validity). However, the randomized trial may not have good external validity if the process of recruitment into the trial resulted in the exclusion of many potentially eligible subjects.Consumers of medical evidence need to be aware that randomized trials vary widely in their quality and applicability to practice. The process of designing such a trial often involves a great many compromises. For example, trials designed to gain FDA approval for an investigational drug or device will need to address certain regulatory requirements that may result in a different trial design from what practicing clinicians would find useful.Meta-AnalysisThe Greek prefix meta signifies something at a later or higher stage of development. Meta-analysis is research done on research data for the purpose of combining and summarizing the available evidence quantitatively. Although it can be used to combine nonrandomized studies, meta-analysis is most valuable when used to summarize all of the randomized trials on a particular therapeutic problem. Ideally, unpublished trials should be identified and included to avoid publication bias (i.e., "positive" trials are more likely to be published). Furthermore, some of the best meta-analyses obtain and analyze the raw patient-level data from the individual trials rather than working only with what is available in the published reports of each trial. Not all published meta-analyses are reliable sources of evidence on a particular problem. Their methodology must be carefully scrutinized to ensure proper study design and analysis. The results of a well-done meta-analysis are likely to be most persuasive if it includes at least several large-scale, properly performed randomized trials. In cases where the available trials are small or poorly done, meta-analysis should not be viewed as a remedy for the deficiency in primary trial data.Meta-analyses typically focus on summary measures of relative treatment benefit, such as odds ratios or relative risks. Clinicians should also examine what absolute risk reduction (ARR) can be expected from the therapy. A useful summary metric of absolute treatment benefit is the number needed to treat (NNT) to prevent one adverse outcome event (e.g., death, stroke). NNT is simply 1/ARR. For example, if a hypothetical therapy reduced mortality over a 5-year follow-up by 33% (the relative treatment benefit) from 12% (control arm) to 8% (treatment arm), the absolute risk reduction would be 12% – 8% = 4% and the NNT = 1/4 or 25. Thus, we would need to treat 25 patients for 5 years to prevent 1 death. If we applied our hypothetical treatment to a lower-risk population, say with a 6% 5-year mortality, the 33% relative treatment benefit would reduce absolute mortality by 2% (from 6 to 4%) and the NNT for the same therapy in this different group of patients would be 50. Although not always made explicit, comparisons of NNT estimates from different studies need to take account of the duration of follow-up used to create each estimate.Clinical Practice GuidelinesAccording to the 1990 Institute of Medicine definition, clinical practice guidelines are "systematically developed statements to assist practitioner and patient decisions about appropriate health care for specific clinical circumstances." This definition provides emphasis to several crucial features of modern guideline development. First, guidelines are created using the tools of EBM. In particular, the core of the development process is a systematic literature search followed by a review of the relevant peer-reviewed literature. Second, guidelines are usually focused around a clinical disorder (e.g., adult diabetes, stable angina pectoris) or a health care intervention (e.g., cancer screening). Third, guidelines are intended to "assist" decision-making, not to define explicitly what decisions should be made in a particular situation. The primary objective is to improve the quality of medical care by identifying areas where care should be standardized, based on compelling evidence.Guidelines are narrative documents constructed by an expert panel whose composition is often chosen by interested professional organizations. These panels vary in the degree to which they represent all relevant stakeholders. The guideline documents consist of a series of specific management recommendations, a summary indication of the quantity and quality of evidence supporting each recommendation, and a narrative discussion of the recommendations. Many recommendations have little or no supporting evidence and, thus, reflect the expert consensus of the guideline panel. In part to protect against errors by individual panels, the final step in guideline construction is peer review, followed by a final revision in response to the critiques provided. Guidelines are closely tied to the process of quality improvement in medicine through their identification of evidence-based best practices. Such practices can be used as quality indicators. Examples include the proportion of acute MI patients who receive aspirin upon admission to a hospital and the proportion of heart-failure patients with depressed ejection fraction who are on an ACE inhibitor. Routine measurement and reporting of such quality indicators can produce selective improvements in quality, since many physicians prefer not to be outliers.ConclusionsIn this era of EBM, it is tempting to think that all the difficult decisions practitioners face have been or soon will be solved and digested into practice guidelines and computerized reminders. However, EBM provides practitioners with an ideal rather than a finished set of tools with which to manage patients. The significant contribution of EBM has been to promote the development of more powerful and user-friendly EBM tools that can be accessed by the busy practitioners. This is an enormously important contribution that is slowly changing the way medicine is practiced. One of the repeated admonitions of EBM pioneers has been to replace reliance on the local "gray-haired expert" (who may be often wrong but is rarely in doubt) with a systematic search for and evaluation of the evidence. But EBM has not eliminated the need for subjective judgments. Each systematic review or clinical practice guideline presents the interpretation of "experts" whose biases remain largely invisible to the review's consumers. In addition, meta-analyses cannot generate evidence where there are no adequate randomized trials, and most of what clinicians confront in practice will never be thoroughly tested in a randomized trial. For the foreseeable future, excellent clinical reasoning skills and experience supplemented by well-designed quantitative tools and a keen appreciation for individual patient preferences will continue to be of paramount importance in the professional life of medical practitioners.Further Readings

Balk EM et al: Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 287:2973, 2002 [PMID: 12052127]
Del Mar C et al: Clinical Thinking: Evidence, Communication and Decision Making. Malden, Mass., Blackwell, 2006
Grimes DA et al: Refining clinical diagnosis with likelihood ratios. Lancet 365:1500, 2005 [PMID: 15850636]
Haynes RB et al: Clinical Epidemiology: How to Do Clinical Practice Research. Philadelphia, Lippincott Williams & Wilkins, 2006
Peterson ED et al: Association between hospital process performance and outcomes among patients with acute coronary syndromes JAMA 295:1912, 2006 [PMID: 16639050]
Reilly BM et al: Translating clinical research into clinical practice: Impact of using prediction rules to make decisions. Ann Intern Med 144:201, 2006 [PMID: 16461965]
Sanders GD et al: Cost-effectiveness of screening for HIV in the era of highly active antiretroviral therapy. N Engl J Med 352:570, 2005 [PMID: 15703422]

Không có nhận xét nào:

Đăng nhận xét