Research Paper - Journal of Applied Mathematics and Statistical Applications (2018) Volume 2, Issue 1

## Reliability and therapeutic decision through generalizability theory: An application in prostate cancer treatment

- *Corresponding Author:
- Carolina Lagares-Franco

Department of Statistics and Operational Research University of Cadiz Spain

**Tel:**+34 696 212 036

**E-mail:**[email protected]

**Accepted date:** December 11, 2018

**Citation: **Lagares-Franco C, Salas-Buzón MC, Gutiérrez-Bayard L, et al. Reliability and therapeutic decision through generalizability theory: An application in prostate cancer treatment. J Appl Math Statist Appl. 2018.;2(1):42-46.

**Visit for more related articles at**Journal of Applied Mathematics and Statistical Applications

### Abstract

Introduction: Now-a-days there are several therapeutic techniques in hospitals and multiple factors that can modify the measurements for treatments. Health professionals should have objective information, in terms of reliability for therapeutic decisions. In our area exist different methods for the treatment of the prostate cancer via external radiotherapy. For each of them there are some factors that can affect the data collected to apply the treatment. The aim of this study is to use statistical advanced techniques, the Generalizability Theory, to evaluate the reliability for three image-guided radiotherapy methods to treat cancer with external radiotherapy: Electronic Portal Imaging (EPI), Cone Beam by Fiducial Markers (CBFM) and Cone Beam by Soft Tissues (CBST).

Methods: Forty patients with prostate cancer were enrolled in a prospective study. Before each daily session, EPI, CBFM and CBST images were sequentially acquired for eleven days in three positions: lateral, vertical and longitudinal. Generalizability Theory is used to analyze reliability and estimate other situations for radiotherapy application.

Results: Generalizability Theory shows high reliability for each method, one by one and among each other. We obtain high reliability also for each position but not two-by-two positions. Using only one method we obtain 0.9 reliability or more, from fifteen sessions.

Discussion: Generalizability Theory is a powerful statistic methodological tool that allows obtaining reliability coefficients in many different situations to help health professionals in therapeutic decisions.

### Keywords

Reliability, Generalizability theory, Image-guided radiotherapy, Prostate cancer, Therapeutic decisions.

### Introduction

For the majority of diagnostic and therapeutic decisions, a systematic data collection is necessary to evaluate the clinical judgment and the consequent patient treatment. All diagnostic tools should guarantee, in terms of reliability, the quality of the collected data (images, numbers, and any other typology of data). Collected data can be affected by several factors such as the diagnostic tool itself, the measurement instrument, the time when the data are collected etc. Knowing the factors leading to main errors in the data collection process will provide the health professionals with objective information necessary for their work.

To evaluate reliability, the medical literature reports the Cronbach's alpha [1] and the Intraclass Correlation Coefficient [2] (ICC) as the most used statistical tools. Nevertheless, these coefficients can be used only for the simple data collection. In order to take into account all the factors which could generate an error in the data collection process, it is necessary to use methodological statistical tools which are more powerful than classic coefficients.

Against this background, this study suggests the analysis of reliability throughout the Generalizability Theory (GT) [3-6]. GT and Classical Theory (CT) have several common elements but GT treats more in details reliability and its different aspects [7-10]. For instance, this theory permits not only to obtain general reliability coefficients, but also to quantify through percentages the error caused by the different factors (patients, methods, medical visit …) intervening in the data collection process using Analysis of Variance (ANOVA) methods. Moreover, it gives information for hypothetic data collection situations that provide the specialist medical staff objective information for clinical decision making.

GT is mostly used in educational [11,12] or psychology and psychiatry [13] contexts but not in clinical research situations. In a previous work we have obtained reliability coefficients to evaluate reliability among imaged methods using the Intraclass Correlation Coefficient (ICC) in a simple context [14].

The proposal of this study is based on the use of GT in the field of Image Guided Radiation Therapy (IGRT) for the treatment of the prostate cancer via external radiotherapy. GT allows to calculate the number of errors which are due to the different sources which can influence the data collection process and provide useful information for therapeutic treatment.

### Materials and Methods

This study is based on a cohort of 40 male patients with early prostate cancer, all included in the protocol of radical external radiotherapy and image-guided radiotherapy of the University Hospital “Puerta del Mar” in Cadiz, Spain. These 40 patients have received, for a total of 38 days, a daily session of external image-guided radiotherapy in an ONCOR Linear Accelerator (Siemens) with 6Mv photons. All own 4 insertions of intraprostate gold fiducial markers. The correction of the radiation beam before each treatment session has been made through daily orthogonal electronic portal images. During days 1, 2, 3, 4, 5, 10, 15, 20, 25, 30 and 35 of the radiotherapy treatments, before each session, two volumetric images (Cone Beam Computed Tomography, CBCT) have been acquired as well: one with the correction for fiducial markers (CBFM) and the other with the correction for soft tissues (CBST). During these 11 days, the positions in the three spatial axes of the isocenter of the treatment beams have been recorded for each imaging method (EPI, CBFM and CBST): lateral, longitudinal and antero-posterior or vertical.

Before each treatment session, two orthogonal portal images have been acquired for each patient: one antero-posterior radiography and one lateral radiography of the prostatic region. The insertions of intraprostate gold fiducial markers have been used as reference. Then, through an automatic electronic system, these insertions have been localized by superposing each fiducial marker on the markers of the reconstructed digital radiography, thus facilitating the correction of the position of the radiation beam (corrected isocenter) into the real position of the target (2D). In the 3D, CBFM and CBST models, before each patient’s treatment session, an X-ray computed tomography (CT) of the prostate region has been acquired (volumetric image). Thereafter, through an electronic system of manual correction operated by the health professional, the position of the prostate has been adjusted to the reference image of the planning CT in two different ways: using as reference the gold insertions (fiducial markers) in the CBFM, or using the soft tissues (like prostate, rectum and bladder) in the CBST.

In order to apply the G Theory it is necessary to determine from
the very beginning of the study what will be measured, and
which factors could affect the data collection. In this specific
case the *measurement object* is represented by the 40 patients,
here indicated with the letter p. All the factors which are likely
to affect the data collection (facets) are: the used method (*m*),
the spatial axes (*e*), and the occasions when the patient has seen
the radiotherapy oncologist, (*o*). The *method* facet is constituted by three fixed levels (the imaging methods), the facet axis has
three fixed levels as well (lateral, longitudinal and vertical), and
the occasion facet is composed by 11 random levels (it could be
more or less radiotherapic sessions).

Following this hypothesis, it is possible to imagine a crossed
design *p×m×e×o* for each measurement object in all possible
combinations of all levels between a facet and the other. In this
design, any observation can be decomposed as the sum of the
different affecting factors, either alone or in combination with
each other. In other words, the observation realized in a patient
(*p*), through a method (*m*), in an axis (*e*) and in one occasion (*o*)
can be written as follows:

Where μ is the global average score on the patient population
and the different ϑ_{α} are the effects adjusted to the facets,
the measurement object and the interactions among them (α
represents each of the design effects), i.e., all the interactions
which can occur in the data collection among patients, methods,
axes and occasions.

The Theory shows that the variability of the observation is determined as the sum of the variabilities of the factors which compose it [6,15]:

With these variances, the researcher can choose news strategies to minimize the scores variance. Every new study plan is called D-study where it is established reliable measurement strategies without the need to collect information again.

The construction of the reliability coefficient in the D-study
is based on the definition of the absolute agreement and the
universe score [6] (which in its turn depends on the facets fixed
before) of its assumed variances. So, the reliability coefficient
*ϕ* is constructed with the variance of the universe score, σ^{2} (Δ) ,
and the variance of the absolute error, σ^{2} (Δ) :

If we use a crossed D-study *p×M ×E ×O*, with M and E
fixed, where these variances are expressed by:

, and

.

But, if we use a nested D-study *p×E ×(M :O)*, with M and
E being fixed, where these variances are modified:

and

In general, when the variance of the absolute error is slight respect to the variance of the universe score, the reliability coefficient is next to 1. So, non-controlled factors (random facets) or their interaction with fixed facets or the measurement object do not contribute to the error in the measurement process.

This co-efficient considers values between 0 and 1. The values
near 1 denote a high reliability, whereas the values near 0 report its absence. The data used in this study have been analysed
with the open office software EduG 6.0 [16]. A total of 3960
images have been acquired from the 40 patients (*p*), during 11
radiotherapy sessions (*o*), in 3 spatial axes (*e*), with 3 imaging
methods (*m*). The use of these data has been approved by the
Ethical Committee of the Hospital.

### Results

Following the hypotheses described above, the designed model
has been applied to the collected data. The ANOVA results are
reported in **Table 1**. The first column shows the factors taken in
the design and the combination of all the possible interactions
among them. The second, the third and the fourth columns report
ANOVA standard information. The fourth column contains the
estimates of the variances obtained for each factor or interaction
with the G Theory together with the percentage of the variability
they represent (column 5). Negative values in variances are due
to the estimation methods and they are described in the literature
[15].

Source of Error | Sums of squares | Degrees of freedom | Root mean squares | Variances | % |
---|---|---|---|---|---|

p | 150.48274 | 39 | 3.85853 | -0.01037 | 10.6 |

m | 1.0152 | 2 | 0.5076 | -0.00115 | 0.1 |

e | 80.9092 | 2 | 40.4546 | 0.02552 | 5.2 |

o | 1.70953 | 10 | 0.17095 | 0.00004 | 0 |

pm | 6.54925 | 78 | 0.08396 | -0.00017 | 0.4 |

pe | 380.45707 | 78 | 4.87765 | 0.13956 | 41 |

po | 89.15593 | 390 | 0.2286 | 0.0012 | 7.4 |

me | 8.15552 | 4 | 2.03888 | 0.00443 | 0.6 |

mo | 0.36891 | 20 | 0.01845 | -0.00013 | 0 |

eo | 3.19791 | 20 | 0.1599 | -0.00048 | 0 |

pme | 13.68004 | 156 | 0.08769 | 0.00513 | 1.5 |

pmo | 25.89775 | 780 | 0.0332 | 0.00063 | 3.2 |

poe | 168.44027 | 780 | 0.21595 | 0.06155 | 20.9 |

moe | 1.29037 | 40 | 0.03226 | 0.00002 | 0 |

pemo | 48.84296 | 1560 | 0.03131 | 0.03131 | 9.1 |

Total | 980.15264 | 3959 | 100 |

**Table 1:*** Analysis of variance of the p×m ×e ×O* *design.*

The percentage of variability given by each factor or interaction
of factors can be interpreted as the percentage of error generated
by the factors themselves in the data collection. Through a
detailed analysis of these percentages, in the isolated factors
it is observed that the main variability of percentage is due to
patients (10.6%), followed by spatial axes (5.2%). This result is
consistent with the fact that patients are distinguished according
to their spatial axes, and consequently the main variability in
the table is due to the interaction between patients and spatial
axes (41%). However, variability is not observed in either the
occasions (0%) or the used imaging methods (0.1%). This means
that both the imaging methods and the day of image acquisition
do not generate any error in data collection. High variability
(20.9%) is reported in the triple interaction among patients, spatial axes and measurement occasions. A global overview
of the percentages shows that the main error of the process is
due to the measurement object itself and that this variability
cannot be controlled by the diagnostic-therapeutic staff.
Nonetheless, other controllable factors like the spatial axes, the
occasions when the images are acquired or the imaging methods
themselves, do not represent in general an important source of
error for the data collection. This means that the variability of
the absolute error described above is negligible with regard to
the variability of the universe score. The data of the variance
estimates in **Table 1 **give rise to a global reliability coefficient of
0.94075, through which we obtain a high reliability among the
three methods in the three spatial axes in the eleven visits with
the radiotherapy oncologist.

**Results for crossed D-study*** p×E ×(M :O)*

**Table 2 **presents an in-depth study of the reliability of the imaging
methods. High reliability is reported for the three methods if
they are used in an isolated way, with the bidimensional method
being the most reliable (ϕ=0.94). Analogously, when analysing
the methods two-by-two a high accordance between them is
reported, independently from the fact of being bidimensional
or tridimensional, or having fiducial markers or soft tissues as
reference.

Methods | ϕ |
---|---|

EPI | 0.94 |

CBFM | 0.93 |

CBST | 0.92 |

CBFM vs. CBST | 0,92 |

EPI vs. CBFM | 0,93 |

EPI vs. CBST | 0,92 |

**Table 2: **Reliability of the imaging methods.

The detailed study of the images acquisition in the three spatial
axes (**Table 3**) indicates a high reliability in each isolated axis,
but a low reliability in the imaging method if only two of the
three axes are used, with the lowest reliability being in the
combined use of the lateral and vertical axes (*ϕ*=0.21).

Spatial Axes | ϕ |
---|---|

Lateral | 0.95 |

Longitudinal | 0.96 |

Vertical | 0.95 |

Longitudinal vs. Vertical | 0.51 |

Lateral vs. Vertical | 0.21 |

Lateral vs. Longitudinal | 0.67 |

**Table 3:*** Characteristic quantities of harmonic phenomena classes.*

**Table 4** shows the reliability analysis of the different occasions
when the patient sees the radiotherapy oncologist by using the
three imaging methods. Even with the data of only 11 visits,
the G Theory permits to estimate the reliability of future visits with no need to collect further information. The obtained results
clearly show a reliability coefficient equal or superior to 0.9
from the sixth specialist’s visit, and a reliability coefficient
equal or superior to 0.95 from the twelfth specialist’s visit.

Sessions |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

ϕ |
0.59 | 0.74 | 0.81 | 0.85 | 0.88 | 0.9 | 0.91 | 0.92 |

Sessions |
9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |

ϕ |
0.93 | 0.94 | 0.94 | 0.95 | 0.95 | 0.95 | 0.96 | 0.96 |

**Table 4: **Global reliability of different radiotherapy sessions.

Finally, **Figure 1** reports the reliability which could be obtained
in the different occasions when the patient sees the radiotherapy
oncologist by using only one of the three imaging methods.
Results show that at least 15 visits with the radiotherapy
oncologist are necessary to reach, with only one imaging
method, reliability above 0.9 points.

Overall, high values of reliability are reported in the majority of the described situations, both in isolated and in combined imaging methods, as well as in the spatial axes or in the visits with the radiotherapy oncologist.

**Results for nested D-study*** p×E ×(M :O)*

It is possible to suppose that clinical professional use for his
patients’ one of the three imaging methods but not necessarily
the same method in each session. Following this hypothesis, it
can be used a nested D-study *p×E ×(M :O)* . Using the variance
estimations from **Table 1**, it is easy to calculate σ^{2}(τ) = 0.04652
and σ^{2}(Δ) = 0.0033756, so that the reliability coefficient in
this situation is *ϕ*=0.9323. This result shows that the clinical professional obtains high reliability using one of the three
imaging methods in each radiotherapy session and that it is not
necessary to use always the same method.

### Discussion

To treat a cancer patient with radiotherapy is necessary to ensure accuracy, safety, reliability and reproducibility [17]. This study describes a specific situation where the error in the data collection can be due to different sources like the patients themselves, the used imaging method, the spatial position taken into account or the occasions when the images are acquired (radiotherapy sessions).

Through the G Theory we have obtained not only a global coefficient of reliability among the three imaging methods but also the reliability coefficients of the individual methods and of their combination two-by-two. These results are not possible with classical methods [8]. These data permit to the health professionals to understand the reliability of the acquired images when they use only one method, the combination twoby- two or the three methods in the eleven radiotherapy sessions, and with the information of the three spatial axes (lateral, longitudinal and vertical). Moreover the G Theory permits to calculate the reliability in the different spatial positions of the patient: lateral, longitudinal and vertical, both in isolation and in combination. With these data, the health professionals can understand which specific axis or axes they have to use the data for the radiotherapy treatment.

Health specialist interpret the information obtained from the data to make clinical decisions [18,19]. In this sense it is important to underline that the G Theory can estimate the reliability coefficients in hypothetical situations of data collection with no necessity to collect data once again. Therefore, the reliability has been calculated both in isolation and in combination for the three imaging methods in different visits with the radiotherapy oncologist. Through this reliability analysis, the health professionals can see whether from a specific visit of the patient it is possible to use the average of the measures obtained in the previous visits in order to direct the radiation towards it without decreasing the reliability of the process.

### Conclusion

In conclusion, Generalizability Theory has proved to be an effective and powerful statistical methodology tool in this study domain where it had not been explored so far. The G Theory is essential in those situations (like the one presented here) where there are multiple factors to take into account for the data collection, and where these can become a possible source of error. Therefore, it is necessary to know and quantify this error. Once these data acquired, the health professionals can understand which factors should be controlled during the data collection if they want to obtain a high reliability of the observation. Then, the G Theory can provide the specialist medical staff with objective information useful for clinical decision making.

### References

- Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16(3):297-334.
- Bartko J, Carpenter W. On the methods and theory of reliability. J Nerv Ment Dis. 1976;163(5):307-17.
- Cronbach LJ. Essentials of psychological testing. (2nd ed), Harper & Row, New York, USA, 1960.
- Cronbach LJ, Rajaratnam N, Gleser GC. Theory of generalizability: A liberalization of reliability theory. Br J Math Stat Psychol. 1963;16:137-63.
- Rajaratnam N, Cronbach LJ, Gleser GC. Generalizability of stratified parallel tests. Psychometrica. 1965;30:39-56.
- Brennan RL. Generalizability theory. Springer-Verlag, New York, USA, 2001.
- Algina J. Elements of classical reliability theory and generalizability theory. Advances in Social Science Methodology. 1989;1:137-69.
- Brennan RL. Generalizability theory and classical test theory. Applied measurement in education. 2011;24(1):1-21.
- Suen HK, Lei P. Classical versus generalizability theory of measurement. J Educ Meas. 2007;4.
- Shavelson RJ, Webb NM, Rowley GL. Generalizability theory. Am Psychol 1989;44(6):922-32.
- Cardinet J, Tourneur Y, Allal L. The symmetry of generalizability theory: Application to educational measurement. J Educ Meas. 1976;13(2):119-35.
- Cardinet J, Tourneur Y, Allal L. Extension of generalizability theory and its applications in educational measurement. J Educ Meas. 1981;18(4): 183-204.
- Salvador-Carulla L, González-Caballero JL, Ruiz M, et al. For the eDESDELTC group. Usability of the eDESDE-LTC instrument: Feasibility, consistency, reliability and validity. 2011.
- Salas-Buzon MC, Gutierrez-Bayard L, Lagares-Franco C, et al. Image-guided radiotherapy using MV for prostate cancer: A correlation analysis between electronic portal imaging with fiducial markers and cone beam CT. J Adv Radiol Med Image. 2015;1(1):101-11.
- Shavelson RJ, Webb NM. Generalizability theory. CA: Sage, Newbury Park. 1991.
- EduG versión 6.0. http://www.irdp.ch/edumetrie/index.htm.
- Goyal S, Kataria T. Image guidance in radiation therapy: Techniques and applications. Radiol Res Pract. 2014.
- Charter RA, Feldt LS. Meaning of reliability in terms of correct and incorrect clinical decisions: The art of decision making is still alive. J Clin Exp Neuropsychol 2001:530-7.
- Southam-Gerow MA, Bonifay W, McLeod BD, et al. Generalizability and decision studies of a treatment adherence instrument. Sage Journals. 2018.