FROM:
Spine (Phila Pa 1976) 2007 (Sep 1); 32 (19 Suppl): S66–72 ~ FULL TEXT
Charles G. Fisher, MD, MHSc(Epi), FRCSC; Kirkham B. Wood, MD
Department of Clinical Research,
Research Unit OPEN,
University of Southern Denmark,
Odense, Denmark
Study design: Literature review.
Objective: To outline the components and application of evidence-based medicine (EBM) with an emphasis on the critical components of conduct and appraisal of clinical research.
Summary of background data: "Evidence-based medicine" is now a commonplace phrase representing the hallmark of excellence in clinical practice. EBM integrates a question, thoughtful comprehensive evaluation of the pertinent literature, with clinical experience and patient preference to make optimal patient care decisions. These decisions must be evaluated with objective outcome measures to ensure effectiveness. There have been some misconceptions around the application of EBM and that it is synonymous with randomized controlled trials (RCTs) or based purely on levels of evidence.
Methods: Narrative and review of literature.
Conclusion: Clinicians must understand the importance of the research question, study design, and outcomes in order to apply the best available research to patient care. Treatment recommendations evolving from critical appraisal are not only based on levels of evidence, but the risk benefit ratio and cost. The true philosophy of EBM, however, is not for research to supplant individual clinical experience and the patient's informed preference, but to integrate them with the best available research. Healthcare professionals and administrators must grasp that EBM is not a RCT. They must realize that the question being asked and the research circumstances dictate the study design. Furthermore, they must not diminish the role of clinical expertise and informed patient preference in EBM.
Keywords: evidence-based-medicine, question, critical appraisal, implementation, outcomes, clinical expertise, patient preference.
From the FULL TEXT Article:
The concept of evidence-based medicine (EBM) has become a commonplace phrase, representing the hallmark of excellence in clinical practice. A Medline search of “evidence-based medicine” in 1993 revealed 6 citations and now in 2007 there are 24,692. [1] Even the lay press has embraced the term, with “evidence-based medicine” being one of “the ideas of 2001” according to the New York Times Magazine. [2] But does this dramatic growth and awareness represent better research and clinical practice or the indiscriminant use of a critically important process that policy makers’ demand and health professionals’ flaunt, but for the most part don’t understand or practice? The recent satirical publication by Smith and Pell [3] would suggest the latter; the answer is probably somewhere in between these 2 extremes.
The concept of EBM evolved from clinical epidemiologists almost 2 decades ago and is defined as “the explicit, judicious, and conscientious use of current best evidence from health care research in decisions about the care of individuals and populations.” [4–6] EBM suggests that the traditional approach of understanding the disease mechanism coupled with experience, intuition, and rationale may not be the most effective means for clinical decision-making. It does not ignore clinical experience but includes the systematic evaluation of research to aid in the best clinical decision-making. There are 5 steps involved in the practice of EBM: defining the question or problem, searching for the evidence, critically appraising the literature, applying the results, and auditing the outcome: in practical terms, “translating” a clinical issue regarding a patient into an actual “answerable question” and then reviewing the relevant research while assessing its quality to provide the best care for the patient. How spine surgeons can best adopt the principles of EBM for their routine practice is important and is the subject of this focus issue.
Background
Orthopedics, including spine surgery, has had a strong tradition in empiric, experience-based science. It was not until landmark papers published in the 1980s [7–9] demonstrated how tenuous the evidence was on which we based our treatment decisions, that a movement toward a more evidence based approach to treatment began. Studies were plagued with a multitude of problems including: vague clinical outcomes as opposed to psychometrically sound health-related quality of life (HRQOL) instruments, retrospective reviews carried out by individuals obliged to the principal investigator, and analytical methods lacking clinical and biostatistical integration. This is most evident when one examines the history of spine surgery and the myriad of techniques and instrumentation strategies that have been introduced, evolved, and ultimately replaced. Innovation and advancement in spine care has in most cases not been by randomized controlled trial (RCT), but by clinicians developing new techniques or implants and reporting their results in case series. [10] Here the issue is that the developer is part of the treatment program and therefore is intimately involved with the treatment’s success or failure, a troublesome bias to overcome. But is the RCT the panacea of comparative clinical research and should clinicians, payers, and policy makers demand a RCT for every comparative study? The answer is a resounding “no” because the study design is dependent on the question being asked. Hence the first step of EBM: to define the question or problem.
The Question
For clinicians interpreting and/or performing clinical research, it is essential for them to have grounding in study design and methodology; however, the first and most essential step in the development or evaluation of any study is to determine the primary question or purpose. In a study, this should be clearly stated at the end of the introduction and usually contains the intervention, the cohort being studied, and other circumstantial factors. It is the well-defined research question that dictates the study design, not that every study should be a RCT because it’s the gold standard. For example, whether disc arthroplasty is better than fusion for the treatment of low back pain is a question appropriate for a RCT design for a number of reasons: surgical techniques are standardized and generalizable with a short learning curve, there is equipoise within the surgeon and patient, selection, and observer bias is minimized and blinding is even possible. Determining the 5–year HRQOL in adult deformity patients treated operatively versus nonoperatively is a question appropriate for a prospective cohort design. Multiple surgical options, poor generalizability, technological advancement, patient complexity, variability, and preference and selection bias make a RCT all but impossible. Once a researcher considers all the issues and has a clear question, the study design (inclusion criteria, power calculation, follow-up schedule, outcome measures, etc.) becomes a relatively effortless process.
Similarly for the clinician practicing EBM, the first step is defining the question around a patient or clinical problem. Seemingly simple, defining the question is often difficult and critical as it directs step 2, the search for evidence. An ill-defined question will turn up infinite references, while too specific a question will yield no references. For the busy clinician whose practice should be dictated in part by the most current literature the concept of “the question” must be understood, before issues around the literature search and evaluation are addressed.
Search for Evidence
Once the specific question is defined, it is critical to look at all the literature, some even recommend looking at unpublished data to overcome publication bias. Because there is a tendency for clinicians to champion articles that comply with their treatment biases, a formal systematic review uses at least 2 reviewers who use a priori study inclusion criteria and predetermined dispute resolution. This ensures objectivity and transparency in the review process and differs from the subjective narrative review or literature search.
Critical Appraisal
Table 1
|
This third step in EBM medicine is equally challenging as the first 2 and is made more difficult without a background in study design and methodology. In essence, the critical appraisal distils down to levels of evidence and grades of recommendation. These are the 2 critical components that allow the clinician to proceed to step 4 of EBM, applying the results to clinical practice. The levels of evidence involve analyzing and ranking studies based on design and methodology, with RCTs or meta-analysis ranking highest and expert opinion lowest. [11, 12] When appraising clinical research, one should determine the question being asked and if the study design is appropriate and executed properly. Ultimately, is there bias or are the results of the study valid? [1] Spine surgery literature contains generic study questions that include: the effects of treatment, treatment successes or failures, prognostic variables associated with outcome, or the utility and effectiveness of diagnostic approaches. [13] These numerous generic questions can be evaluated by 1 of 2 broad study designs: those that analyze primary data and those that analyze secondary data. Studies that collect and analyze primary data include: case reports and series, case-control, cross-sectional, prospective and retrospective cohorts, and the RCT (Table 1). Analysis of secondary data occurs in systematic reviews or meta-analysis for the purpose of pooling or synthesizing data to try and answer a question that is not perhaps practical or answerable with an individual study. [14]
Although a detailed description of the various study designs is beyond the scope of this article, a few important concepts should be discussed to better understand the critical appraisal process.
Study Design
Case series allow for an assessment of a clinical course or response to an intervention. Few conclusions can be made because of the selection bias, subjective assessment, often an ill-defined number of subjects (n), and lack of a comparison group. Case series are improved by using validated outcome measures and clearly defined inclusion criteria. For unusual conditions, the case series maybe the best available evidence a clinician will achieve.
Case-control studies are retrospective and compare patients already with a definitive outcome to a control group minus the outcome. Their utility is principally in identifying prognostic factors for an outcome. Risk factors for lumbar disc recurrence would be an example of a question best answered by this design.
Cohort studies can be retrospective or prospective, with the latter providing better scientific evidence. Prospective cohort studies are tightly controlled often using a control group. They require a time zero, strict inclusion/exclusion criteria, standardized follow-up at regular time intervals, and efforts to optimize follow-up and reduce dropouts. Cohort studies are expensive, time-consuming, and have selection bias, but usually provide greater generalizability than the RCT. They are at risk for overestimating the true effect, but unlikely to explain the entire benefit. Cohort designs are ideal for: identifying risk factors for disease, determining the outcome of an intervention, and examining the natural history of a disease. [14]
RCTs are justifiably recognized as the gold standard in clinical research. Despite this, they have well-recognized disadvantages, including high costs, administrative complexity, prolonged time to completion, and recruitment difficulty. Furthermore, RCTs in surgery are complicated by difficulties in blinding, randomization, technique standardization, and generalizability or external validity. Nevertheless, the ability to control for known and unknown bias outweighs these disadvantages. Randomization is unrivalled in ensuring the balancing of the experimental and control groups for unknown confounders. [15]
Historically, spine surgeons have struggled with RCTs, especially those that compare surgery with nonoperative care. Recruitment, patient compliance, and crossovers or dropouts are frequently encountered problems. [16] Inadequate power (Type 2 error) is also an issue, with only 17% of the RCTs in spine surgery having adequate power to determine an appropriate difference and 27% having identified the all important primary question.16
The SPORT study is an excellent example of the challenges of doing a RCT in surgery. [17] Despite knowledgeable, experienced investigators, adequate funding, and probably the most simple and prevalent operation in spine surgery, recruitment was one third of those eligible and in those who did participate, an unacceptably high crossover rate occurred making any conclusions difficult even with a Level I design. [18]
Despite these challenges, clinical scientists should not be deterred from pursuing a RCT if it is the best design for the question being asked. For questions around rare conditions or with complex issues related to multiple prognostic variables, time frame, cost, and recruitment, the RCT is not appropriate and therefore the best available evidence would be a case series or prospective cohort study.
A systematic review provides a rational synopsis of the available literature. By summarizing all relevant literature on a particular topic, the systematic review tends to be a tremendous asset to the busy clinician. A systematic review attempts to overcome the bias that is associated with the majority of “traditional” reviews or more appropriately termed “narrative” reviews. Through the application of rigorous methodology, potential bias is minimized. A properly conducted systematic review will ensure all published and unpublished literature is considered, each study will be evaluated for its relevance and quality through independent assessment, and the remaining studies synthesized in a fair and unbiased manner. [4] A good systematic review is transparent. Transparency implies openness by the authors so that the reader can determine the validity of the conclusions for themselves.
Component studies of a systematic review may be combined qualitatively or in the case of RCTs quantitatively. When a quantitative synthesis is carried out, it is termed a meta-analysis. Meta-analysis refers to the statistical technique used to combine independent studies. A meta-analysis is particularly useful when combining several small studies whose results may be inconclusive due to low power.
With the limited number of RCTs in spine surgery, the majority of systematic reviews will be qualitative in nature. This, however, is a dramatic step forward from the inherent bias of the narrative reviews of the past by ensuring a comprehensive, objective, transparent review of the best available literature to answer an a priori question. Indeed, systematic reviews and clinical practice guidelines allow the clinician to practice EBM without a strong background in clinical epidemiology.
Grades of Recommendation
Table 2
|
Through evaluation of study design and methodology, studies can be graded, with the grade determining the strength of recommendation for a particular intervention. Numerous systems have been developed to categorize the levels of evidence and most provide grades of recommendation. [19] In January 2003, the editorial board of the American edition of the Journal of Bone and Joint Surgery adopted a level of evidence rating system [20, 21] based only on study design (Table 2). Then in 2005, grades of recommendation were introduced and in 2006 revised based on levels of evidence and consistency. [22] “Grade-A recommendations are based on consistent Level-I studies. Grade-B recommendations are based on consistent Level-II or III evidence. Grade-C recommendations represent either conflicting evidence or are based on Level-IV or V evidence. A grade of I indicates that there is insufficient evidence to make a treatment recommendation.” The specific grades better guide the surgeon to changing their practice, with Grade A being a definite and Grade B a probably. [22]
Although this initiative should be applauded, its validity is debatable if rigidly applied to every study without consideration of the study question, nature, and circumstances of the research and intervention, and values and preferences. It must be remembered that current concepts around levels of evidence and critical appraisal are based on a medical model and surgical studies have unique issues that may be better dealt with in an evidence based surgical model. [23] The nature of a surgical intervention introduces challenges around: selection and observer bias, blinding, standardization of the technique, learning curve, generalizability, prevalence of the problem, and patient and surgeon equipoise. [23] Surgical research is complex so the research question framed by the above surgical considerations determines the best study design. In other words, strong recommendations can be based on lower levels of evidence if the question and circumstances dictate it. [24] The suggestion that RCTs are the only acceptable design is a shortsighted misconception and lacks insight as to the relative strengths of the different methods. [25] The proper use of evidence-based information is not the strict adherence to only RCTs, but more accurately, the informed and effective use of all types of evidence.
Table 3
|
Grades of recommendation have evolved over the years, and probably the most compelling and innovative approach is that of Guyatt et al, shown in Table 3. [26, 27] The strength of their recommendation depends not only on the methodology, but the trade-off between benefits and risks plus costs, as judged by expert opinion and the literature. If the risk/benefit ratio is clear, a strong recommendation is made (Grade 1), if the magnitude of the ratio is less certain, a weaker Grade 2 recommendation is made. Grade 2 recommendations would support variation in patient and/or clinician values that will result in different treatment choices, even for average patients, an approach often seen in surgical care. The importance of the benefit/risk and cost ratio in the recommendation is manifested by it being placed first in the grading of recommendations scheme, i.e., 1A.
Grades A through C represent the typical levels of evidence hierarchy. The magnitude of bias within the study increases as the methodologic quality goes down (A to C). The uniqueness of this model is that despite only Grade C evidence, strong or intermediate strength recommendations can still be made based on the risk/benefit ratio and whether the question being asked is appropriate for the study design. This systematic and practical approach to grading management recommendations is continuing to evolve and is an essential step in the broad acceptance and implementation of EBM. [28]
The reality of a busy clinician being able to properly apply the first 3 steps of EBM is low. Fortunately, there are now many information resources available, like this focus issue, where clinical research has been evaluated for validity by experts in clinical epidemiology and clinical suitability by experienced physicians. This modifies the EBM process for busy clinicians by locating the prereviewed evidence that applies to the initial question so a treatment decision can be made.
Application of Results and Auditing Outcomes
The intellectually challenging first 3 steps of EBM arrive at a recommendation, and the next 2 steps are to implement and evaluate the recommendation. The strict application of the recommendation is difficult as the question of whether it is appropriate for all patients and situations lingers in the background. Therefore, the clinical expertise and patient preference components of EBM are used in the interventional decision. [25, 29] Whether the intervention was appropriate and resulted in a good outcome for a certain patient, in a particular clinician’s hands, will only be answered by careful prospective outcome research. Outcomes research and other epidemiologic issues historically have been of little interest to the clinician, but as pressure builds for therapeutic accountability, clinicians find themselves in the unfamiliar territory of HRQOL, cost-effectiveness, and other patient-based outcomes. Use of these patient-based measures will continue to grow as patients, administrators, policy makers, and our professional organizations demand EBM.
There are 3 general psychometric criteria that should be established in HRQOL measures before they are endorsed. Reliability is the ability of the tool to be reproducible and internally consistent over time. Validity ensures that the instrument is accurately measuring what it is supposed to be measuring. Finally, responsiveness is the ability of the questionnaire to detect clinically relevant change or differences. Exemplary work by our professional societies has provided guidance and education and established generic and disease-specific outcome instruments to be used to answer specific research questions. The Scoliosis Research Society (SRS), for example, has developed the disease-specific SRS-22 [30] to detect responsiveness of the intervention to germane patient values in spine deformity surgery. Most studies’ primary question uses a disease-specific questionnaire, but a generic outcome tool such as the Short Form-36 (SF-36) is also included to allow for broad comparisons across various disease states, enabling an assessment of the impact of healthcare programs.
Rising costs will make health economic evaluation more prominent in the literature. The unsustainable economics of ever-increasing technology and patient expectations will make economic evaluation critical in assisting health planners to evaluate interventions using “cost-benefit” frameworks. [31] The use of preference-based measures, such as the EQ-5D32 and the Health Utilities Index, [33] are being incorporated into clinical trials to allow for cost utility analysis. There is also a trend toward applying economic methods to health measures, such as the SF-36 and the development of new preference based measures, such as the SF-6D. [34] This has the advantage of being able to obtain quality-adjusted life-years from existing or prospective SF-36 data. [34]
Solid, high-integrity clinical research in spine surgery is difficult and expensive. Even with a clear a priori question, appropriate study design, and outcome instruments, the challenges of relatively rare conditions, patient preference, and ongoing advancements in molecular, biomechanical, and biomaterial technology are difficult to overcome. There has been recognition that questions around new technology or uncommon conditions cannot be definitively answered by 1 center or institution but can only be addressed through collaboration at a national and often international level between experts in the treatment of spine deformity and study design. This approach allows for the pooling of patient data in sophisticated, yet realistic databases, using common reliable terminology that will address the heretofore chronically underpowered spine literature. [16] Although only the minority of this evolving work will be RCTs, large prospective cohort studies in a surgical setting are often thought to be on par with RCTs and provide superior generalizability. [35, 36]
Perhaps the most compelling and growing component of EBM is the patient’s empowerment in the treatment decision-making process. In the past, the clinician informed the patient of the risks and benefits and often judged the best approach on the patient’s behalf. Now, patients are thought to be the best judge of values, and decisions are shared between the patient and clinician. [37] Evidence-based decision aids are being developed and evaluated to help patients make more informed and personalized choices about treatment options. [37]
Conclusion
Capable clinicians must understand the importance of the research question, study design, and outcomes to apply the best available evidence to patient care. Treatment recommendations evolving from critical appraisal, however, are no longer just based on levels of evidence, but also the risk benefit ratio and cost. The true philosophy of EBM, however, is not for research to supplant individual clinical experience and the patients’ informed preference, but to integrate them with the best available research. Health policy makers, editorial boards, granting agencies, and payers must learn that EBM is not always a RCT. They must realize that the question being asked and the research circumstances dictate the study design. Finally, they must not diminish the role of clinical expertise and informed patient preference in EBM as it provides the generalizability so often lacking in controlled experimental research.
Key Points
Evidence-based medicine is not a randomized controlled trial.
The primary question and research circumstances dictate the study design for clinical research.
There are 2 broad study types: those that analyze primary data and have a hierarchy of specific study designs and those that analyze secondary data.
Health-related quality of life outcomes are essential in clinical research.
Clinical experience and patient preference are integral components of evidence-based medicine.
References:
Schünemann HJ, Bone L.
Evidence-based orthopaedics.
Clin Orthop 2003;413:117–32.
Hitt J.
The Year in Ideas: A to Z: Evidence Base Medicine.
New York Times Magazine, December 9, 2001.
Smith GCS, Pell JP.
Parachute use to prevent death and major trauma related to gravitational challenge:
systematic review of randomized controlled trials.
BMJ 2003;327:1459–61.
Petrie A.
Statistics in orthopaedic papers.
J Bone Joint Surg Br 2006;88:1121–36.
Straus S, Richardson WS, Glasziou P, et al.
Evidence-Based Medicine: How to Practice and Teach EBM, 3rd ed.
London: Churchill Livingstone; 2005.
Sackett DL, Straus S, Richardson SR, et al.
Evidence-Based Medicine: How to Practice and Teach EBM.
London: Churchill Livingstone; 2000.
Sledge CB.
Crisis, challenge, and credibility.
J Bone Joint Surg Am 1985;67:658–62.
Garland JJ.
Orthopaedic clinical research: deficiencies in
experimental design and determinations of outcome.
J Bone Joint Surg Am 1988;70:1357–64.
Howe J, Frymoyer J.
Effects of questionnaire design on determination of
end results in lumbar spine surgeries.
Spine 1985;10:804–5.
Carr AJ.
Evidence-based orthopaedic surgery.
J Bone Joint Surg Br 2005;87:1593–4.
Bhandari M, Tornetta P.
Issues in the design, analysis, and critical appraisal
of orthopaedic clinical research.
Clin Orthop 2003;413:9–10.
Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS:
Evidence-Based Medicine:
What It Is and What It Isn't
BMJ 1996 (Jan 13); 312 (7023): 71-72
Sackett DL, Haynes RB.
Evidence base of clinical diagnosis:
the architecture of diagnostic research.
BMJ 2002;321:539–41.
Fisher C, Dvorak M.
Orthopaedic research: what an orthopaedic surgeon needs to know.
In: Vaccaro A, ed. Orthopaedic Knowledge Update 8.
Rosemont, IL: American Academy of Orthopaedic Surgeons; 2005:3–13.
Hanson B, Kopjar B.
Clinical studies in spinal surgery.
Eur Spine J 2005;14:721–5.
Bailey CF, Fisher CG, Dvorak MF.
Type II error in the spine surgical literature.
Spine 2004;29:1723–30.
Weinstein J, Tosteson T, Lurie J, et al.
Surgical vs nonoperative treatment for lumbar disk herniation.
The Spine Patient Outcomes Research Trial (SPORT):
a randomized trial.
JAMA 2006;296:2441–50.
Fisher C, Bishop P.
SPORT and the Canadian Experience.
Spine J 2007;7:263–5.
Atkins D, Eccles M, Flottorp S, et al.
Systems for grading the quality of evidence and the strength of recommendations:
1. Critical appraisal of existing approaches: the GRADE Working Group.
BMC Health Serv Res 2004;4:38.
Wright JG, Swiontkowski MF, Heckman JD.
Introducing levels of evidence to the journal.
J Bone Joint Surg Am 2003;85:1–3.
Obremskey WT, Pappas, N, Attallah-Wasif E, et al.
Level of evidence in orthopaedic journals.
J Bone Joint Surg Am 2005;87:2632–8.
Wright J, Einhorn T, Heckman J.
Grades of recommendation.
J Bone Joint Surg Am 2005;87:1909–10.
Meakins JL.
Innovation in surgery: the rules of evidence.
Am J Surg 2002;183:399–405.
Stein PD, Collins JJ, Kantrowitz A.
Antithrombotic therapy in mechanical and biological
prosthetic heart valves and saphenous vein bypass grafts.
Chest 1986;89:465–535.
McNeill T.
Evidence-based practice in an age of relativism:
toward a model for practice.
Soc Work 2006;51:147–56.
Canadian Task Force on the Periodic Health Examination.
The periodic health exam.
Can Med Assoc J 1979;121:1193–254.
Guyatt G, Schuneman HG, Cook D, et al.
Grades of recommendation for antithrombotic agents.
Chest 2001;119(suppl):3–7.
Schünemann HJ, Jaeschke R, Cook DJ, et al.
An official ATS statement: grading the quality of evidence and strength
of recommendations in ATS guidelines and recommendations.
Am J Respir Crit Care Med 2006;174:605–14.
Goldhahn S, Audigé L, Helfet DL, et al.
Pathways to evidence-based knowledge in orthopaedic surgery:
an international survey of AO course participants.
Int Orthop 2005;29:59–64.
Bridwell KH, Cats-Baril W, Harrast J, et al.
The Validity of the SRS-22 instrument in an adult spinal deformity
population compared with the Oswestry and SF-12: a study of response
distribution, concurrent validity, internal consistency,
and reliability.
Spine 2005;30:455–61.
Robinson R.
Economic evaluation and health care: what does it mean?
BMJ 1993;307:670–3.
Brooks R.
EuroQol: the current state of play.
Health Policy 1996;37:53–72.
Torrance GW, Furlong W, Feeny D, et al.
Multi-attribute preference functions: Health Utilities Index.
Pharmacoeconomics 1995;7:503–20.
Brazier J, Roberts J, Deverill M.
The estimation of a preference-based measure of health from the SF-36.
J Health Econ 2002;21:271–92.
Concato J, Shah N, Horwitz R.
Randomized, controlled trials, observational studies, and
the hierarchy of research designs.
N Engl J Med 2000;342:1887–92.
Benson K, Hartz A.
A comparison of observational studies and randomized, controlled trials.
N Engl J Med 2000;342:1878–86.
O’Connor A.
Using patient decision aids to promote evidence-based decision making.
EMB Notebook 2001;6:100–2.
Return to EVIDENCE–BASED PRACTICE
Since 1-24-2023
|