A.L.B. Rutten, C.F. Stolper, RFG Lugten, RWJM Barthels
Commissie Methode en Validering VHAN (Dutch Association of Homeopathic Physicians), The Netherlands
A pilot study was performed to investigate the possibilities and restrictions of likelihood ratio (LR) investigation using three symptoms. Qualitative vagueness and expectation bias is inherent in our method, but is, in part avoidable. It appears that experienced observers assess common homeopathic symptoms quite similarly. Clinical judgement is an essential part of our work and should be preserved during assessment of LR. The investigation does not influence clinical practice and can be maintained for a long period, provided the appropriate software is used. A limited range of symptoms seems most suitable for LR investigation.Homeopathy (2003) 4, 213-216
In the accompanying paper(1) we investigated the theoretical problems that could arise in assessing vague clinical and homeopathic symptoms. The likelihood ratio (LR) method allows some latitude in the quantitative interpretation of our symptoms, but we must be aware of expectation bias and qualitative misinterpretation of symptoms. In the editorial comment to our earlier article(2) Peter Fisher stated that LR investigation requires data collection ".. on a daunting scale, particularly for rarely used medicines and infrequent symptoms".(3) In order to get more clarity about these problems and to prepare a protocol for the research of LR we performed a pilot study.
We performed a prospective investigation of the prevalence of some symptoms in four homeopathic practices as a pilot-study to trace methodological problems researching LR of homeopathic symptoms. We assessed three kinds of symptoms with different kinds or degrees of vagueness: 'desire for coffee', 'fear of snakes' and 'loquacity'. We expected 'loquacity' to be the vaguest symptom. This symptom had already been assessed in our materia medica validation.(4) For this symptom we recruited one extra observer.
We deliberately did not restrict participants in scoring the symptoms, in order to evaluate the differences in interpretation of the symptom based on the experience of each observer. One of the things we wanted to know was the inter-rater variability in the prevalence of the symptoms as observed by relatively unprepared (but experienced) observers; do they have the same notion about these symptoms? In Table 1 this is expressed by the standard error in the mean prevalence of the symptom. The participants were not aware of each others' data. We also tested three different ways to collect data. A spreadsheet was provided to two participants, one participant used his own spreadsheet, one used his own practice administration database program and one used a paper form.
The pilot study was carried out from July till December 2002. All new patients more than 2 years old were included. The experience of the investigators was noted in years.
There is no clear relation between the description of remedies and the investigated symptoms, eg about 20 different remedies were prescribed for 49 loquacious patients. Lachesis was prescribed 10 times, three of these patients were loquacious. Then, of course, there is no sufficient follow-up to assess results of treatment. After data collection the personal assessment-criteria for each symptom were exchanged. Some participants asked every patient if he/she was talkative (LR and ES) and other participants merely noted their own impression during consultation. One observer (SJ) noted 'loquacity' if he was somewhat annoyed, or hampered in his usual history-taking, by the loquacity of the patient. Coffee intake was asked by every practitioner and then related to cultural and personal circumstances by clinical judgement by two raters (LR and RB). This symptom revealed some unexpected hits like a four-year-old child stealing the leftovers of the coffee cups of the grownups. One rater (PF) took more than 5 cups a day as desire for coffee and one rater (ES) noted desire for coffee as positive if coffee was among the five most wanted food items. Three observers enquired directly for fear of snakes. If the fear of snakes was confirmed, the additional question was if this fear was also present if the snake was behind glass or in a picture (television). One (PF) asked for fear of animals.
Table 1 Results of pilot study
|
LR |
ES |
PF |
RB |
SJ |
totals |
mean prevalence/ standard error |
|
|
experience |
23 |
17 |
14 |
14 |
21 |
- |
|
|
number patients |
114 |
104 |
65 |
77 |
149 |
509 |
|
|
desire coffee |
9 |
11 |
9 |
4 |
- |
33 |
|
|
prevalence desire coffee |
0.079 |
0.106 |
0.138 |
0.052 |
- |
- |
0.065 |
|
standard error desire coffee |
0.02494 |
||||||
|
loquacity |
11 |
13 |
8 |
8 |
9 |
49 |
|
|
prevalence loquacity |
0.096 |
0.125 |
0.123 |
0.104 |
0.060 |
- |
0.096 |
|
standard error loquacity |
0.01202 |
||||||
|
fear snakes |
6 |
4 |
1 |
2 |
- |
13 |
|
|
prevalence fear snakes |
0.053 |
0.038 |
0.015 |
0.026 |
- |
- |
0.026 |
|
standard error fear snakes |
0.00915 |
Time to register the presence of the symptoms with each new patient varied from 2 seconds when a database program was used to 30 seconds if it was noted on paper. It took 10 minutes to one hour to prepare the data for transmission to the co-ordinator. It takes some hours to evaluate results of treatment, but these data were not yet assessed, because follow-up was too short.
To assess the validity of our procedure we can use the ten questions for the assessment of diagnostic testing proposed by Greenhalgh:(5)
This study revealed the prevalence of 'loquacity' in the population seeking homeopathic help. There were 10 lachesis prescriptions in this population of 509 patients, so we think that the prevalence in the whole population is a tolerable approximation of the prevalence in the non-lachesis population. From our materia medica validation meetings we also have retrospective data on loquacity and lachesis, 16 cases were assessed with a score of +3 or +4 on the GHHOS scale. Of these cases seven were marked as loquacious. We expect no under-reporting because the symptom loquacity is well known for lachesis and all participants got the opportunity to complete their data while discussing the cases. In our validating procedure 3 to 6 colleagues assess the quality of the case. In both the retrospective data and the prospective data 'loquacity' was ill defined. Based on a combination of the prospective and the retrospective data we estimate the LR+ of loquacity for lachesis to be 4.9. The 95% Confidence Interval (Simel6) is 3.3-6.5.

Figure 1 Loquacity and Lachesis; LR+ = 4.9
Based on this LR+ we draw the graph, see figure 1. On this graph with LR+= 4.9 we can see that a prior chance of 1% goes to 4,7% if loquacity is present. A priorchance of 4,7% goes to 19%; 19% goes to 54%; 54% goes to 85%. Theses figures suggest that we need four symptoms of this strength to make a nearly certain prescription, if we assume that the priorchance at the beginning of the consultation is 1%. This is of course still very hypothetical and restricted to several conditions like symptoms being mutually independent. The estimation of LR of 'loquacity' for lachesis is better than our current bold typeface in the repertory, but still halfway towards our final goal, data based on prospective research.
Three main points emerge from this pilot study: (1) handling of vagueness. (2) the feasibility of LR research and (3) the validity of our calculation of LR of loquacity for lachesis.
In researching our instruments we will detect several weaknesses. One of them is insufficient inquiry about symptoms during consultation and bias in interpretation of symptoms. Many symptoms in the materia medica are not clearly defined. Our pilot study indicated that experienced practitioners have a kind of intuition about symptoms that overrides descriptions like the amount of coffee or words spoken. Maybe bias is unavoidable, but we are insufficiently aware of it at the moment. We must make it clear that we research 'Loquacity, changing the subject frequently' as observed during consultation and not 'Loquacity' in a broader sense. If we research loquacity in the broader sense we must reach consensus about what we understand by loquacity and about the way we investigate it, eg about the questions we use. The use of LR allows quantitative vagueness only if we can avoid expectation bias. This kind of bias can be detected by analysing the data on variation from randomness related to different observers.
We must develop means to detect misclassification and to handle it, including:
We think that the research for LR is most suited for symptoms with a prevalence between 2 and 15% and a reputation as keynotes. There are only a few hundred symptoms that fit these conditions.
This pilot-study indicates that the prevalence of symptoms in homeopathic practice can be researched, even for vague symptoms. The next step is to investigate our 'gold standard', constituting the 'a and c groups' of the 2x2 diagram. We have no hard 'gold standard' like the inflamed appendix, excised from the abdomen of the patient. But we must be practical, the aim is to improve our method by learning from our best cases. If we regard our best cases as 'gold standard' we can investigate the symptoms that led us to these. We are now developing criteria to assess the best cases.
Adequate description of the assessment is imperative, including inter-rater variance. Using the combination of retrospective research for the prevalence of the symptom in the remedy-group and prospective research for the prevalence in the rest of the population might give an indication of LR. The validity of this procedure seems questionable (but still better than the present representation in the repertory).
Furthermore our pilot study indicates that collecting data, necessary to investigate LR, is not time-consuming so it can be maintained for years, especially with help of proper software.
The question is not only if the test is valid, but also if the investigator did what the average clinician does (or should do). After the test our concern is if the test (symptom) can be applied by every clinician, in this way limiting variance between clinicians.
We think that a few hundred rubrics of the repertory will benefit greatly from assessment of their LR. This will not only improve the efficacy of the repertory but it will enable us to investigate our methodology further. Questions like 'What does LR mean in our practice?' and 'How many symptoms do we need to be sure of our prescription?' can be formalised once the strength of our symptoms is better described. In a following paper we will describe the possible changes of the repertory when LR is included and some methodological consequences.
We thank the other participants in the pilot-study, Paul Fruijtier and Stan Jesmiatka, for gathering the data and active participation. We thank Dick Bezemer for his comments.
Lex Rutten, MD
Aard 10 - 4813 NN Breda, Netherlands