Development of a Greek Oral health literacy measurement instrument: GROHL

Background Oral health literacy is an important construct for both clinical and public health outcomes research. The need to quantify and test OHL has led to the development of measurement instruments and has generated a substantial body of recent literature. A commonly used OHL instrument is REALD-30, a word recognition scale that has been adapted for use in several languages. The objective of this study was the development and testing of the Greek language oral health literacy measurement instrument (GROHL). Methods Data from 282 adult patients of two private dental clinics in Athens, Greece were collected via in-person interviews. Forty-four words were initially considered and tested for inclusion. Item response theory analysis (IRT) and 2-parameter logistic models assessing difficulty and discriminatory ability were used to identify an optimal scale composition. Internal consistency was examined using Cronbach’s alpha and test-retest reliability was measured using intraclass correlation coefficient (ICC) in a subset of 20 participants over a two-week period. Convergent validity was tested against functional health literacy screening (HLS) items, dental knowledge (DK), oral health behaviors (OHBs), oral health-related quality of life (OHRQoL; OHIP-14 index), as well as self-reported oral and general health status. Results From an initial item pool of 44 items that were carried forward to IRT, 12 were excluded due to no or little variance, 10 were excluded due to low item-test correlation, and 2 due to insignificant contribution to the scale, i.e., difficulty parameter estimate with p > 0.05. The twenty remaining items composed the final index which showed favorable internal consistency (alpha = 0.80) and test-retest reliability (ICC = 0.95). The summary score distribution did not depart from normality (p = 0.32; mean = 11.5; median = 12; range = 1–20). GROHL scores were positively correlated with favorable oral hygiene behaviors and dental attendance, as well as HLS, DK and education level. Conclusion The GROHL demonstrated good psychometric properties and can be used for outcomes research in clinical and public health settings.

The need to quantify and test OHL has led to the development of measurement instruments and has generated a substantial body of recent literature. Several instruments and scales have been introduced to measure OHL, including word recognition (e.g., REALD-99 and REALD-30) [13,14], comprehension [15,16] and functional skills-based tools (e.g., TOFHLID) [17]. These instruments have been used in a variety of settings and among diverse populations, internationally. The emanating body of literature has verified the postulated associations between OHL and a host of behaviors and outcomes, among adults [6][7][8][9] and their children [10,[18][19][20].
A recent systematic review of OHL measurement tools reported by Parthasarathy and colleagues [21] discusses the strengths and weaknesses of available measurement instruments. The authors suggest that additional work to refine and validate most tools is needed, and further research is warranted to measure broader conceptualizations of OHL, address cross-cultural and language adaptation issues. It was also noted that 23 of 29 OHL reviewed studies were conducted in the United States-European studies investigating OHL as an oral health determinant are generally scarce, in spite of almost half of Europeans are known to have inadequate or problematic health literacy skills [22]. The most commonly used OHL instrument is REALD-30-a word recognition scale that has been adapted for use in several languages other than English [23][24][25][26][27], but not Greek. In fact, currently no Greek language OHL measurement instrument exists. To address this gap, this study sought to develop and validate a Greek-language OHL instrument (GROHL).

Study population
Previous studies reporting the development of OHL instruments have used samples sizes in the range of 100-200 participants [14,[23][24][25][26][27]. Specifically, previous reports of the development of REALD-99 [13] and REALD-30 were based upon sample sizes of 102 and 202 adults [14], respectively. Based on this information, we did not conduct a formal sample size estimation or power calculation and sought to enroll a sample of over 200 individuals. Thus, we recruited a consecutive convenience sample of 300 adults who were seeking care at two private practice dental clinics in Athens, Greece. The choice of two private dental clinics as recruitment venues was motivated by an effort to capture individuals who were seeking mainstream dental care versus specialized care at an academic center. The inclusion criteria were self-reported ability to speak, understand and write in the Greek language. Exclusion criteria were inability to read, understand and speak Greek due to vision/hearing problems or any other reason, and working in areas associated with oral health. A second, independent sample of 20 adults was recruited, using the same inclusion and exclusion criteria, for the purposes of restretest reliability evaluation of the index, over a two-week period. Five of the 300 individuals that were sequentially screened for eligibility were determined to be ineligible. Of the remaining 295, 282 agreed to participate in the study-a response rate of 96%. The study received ethics approval from the institutional review board of the Athens University School of Dentistry and all participants provided a signed, informed consent.

Data collection
To reduce biases associated with low literacy affecting data collection completeness and quality, data collection was done using an interview format. A structured questionnaire was used by a single investigator/interviewer (first author) to conduct all participant interviews. Each interview lasted approximately forty minutes. The collected data domains included socio-demographics, self-reported health status and behaviors, oral health knowledge, health/oral health literacy, and oral health-related quality of life (OHRQoL). Beyond this information and as part of the GROHL development, the initial pool of words tested for reading and recognition comprised an array of 44 words. The administration of the new, under development index was not timed separately and was not recorded; nevertheless, its maximum duration did not exceed 6 min.
Information on health status and health behaviors was obtained from questionnaire items regarding overall health and oral health, dental visits and oral hygiene practices. More specifically the participants were asked 2 self-rated health questions (possible answers: excellent, very good, good, fair, poor, very poor, I don't know, prefer not to answer); when their last dental visit occurred and why; and what type of dental treatment they have received over the years. They were also asked how often they brush their teeth and use dental floss, interdental brushes and mouthwash. Oral health knowledge was measured using an array of 16 true or false statements which each participant was asked state their agreement or disagreement with (a 'don't know' option was also possible). Correct answers were scored as 1 and incorrect or 'don't know' were scored as 0. These items were employed in two recent studies investigating the association between oral health literacy and oral health-related knowledge [28]. Three additional items were used to evaluate health/oral health literacy. OHRQoL was measured with a Greek-language version of the Oral Health Impact Profile (OHIP-14) [29].

Development and administration of the GROHL word inventory
The development of GROHL departed from an initial pool of 44 candidate words selected from the published and validated English version of the REALD-30 instrument [14] with their explanatory words (OHLA-E), as well as an additional 14 words chosen from the longer version of the instrument, the similarly validated REALD-99 [13]. In our selection of these additional 14 words, we excluded those that are used in daily routine without specific relevance to dentistry (e.g., diet, habits, snacking, approval), those having a stronger association with medicine or general health care than dentistry (e.g., cancer, diabetes), as well as very common words in the Greek language, even if they were related to dentistry (e.g., tongue, dentist). In sum, we sought to be maximally inclusive of initial pool items that could serve the scale purpose, while excluding items that we determined upfront that would not perform well. All these words were strongly associated with oral health, and were agreed upon by consensus of two dental professionals/investigators-thus, demonstrating face validity. Content validity was not explicitly tested in this Greek-language adaptation of the instrument; however, its English-language counterparts have been the most extensively used OHL instrument used in the literature. Criterion validity was determined upon the examination of GROHL's correlation with oral health behaviors and knowledge, oral health knowledge, health literacy screeners, and OHRQoL. Reliability was determined via Cronbach's alpha.
Translation of the initial 44 words in the item pool was done using English-Greek dictionaries using professional knowledge/expertise when needed. The final translated version was further reviewed by two dental academicians and investigators who were proficient in English and produced a back-translation of the initial Greek instrument into English. This final version was also screened and verified in terms of language and with an independent native speaker and translator. All REALD-99 and REALD-30 items that were used as the initial pool of words to be tested, their recognition-test accompanying words, and their Greek-language counterparts, are presented in the Supplemental Table  (Additional file 1: Table S1). Briefly, reasons for exclusion in the construction of the GROHL-20 included: no or little (< 5%) variance (n = 12) in responses, item-test correlation of less than 0.40 (n = 10) and test difficulty parameter estimate with p > 0.05 (n = 2). Each participant was given a laminated copy of the 44 oral health-related Greek words list that comprised the initial GROHL pool of candidates. Participants were asked to read aloud each word and state whether they knew what the word meant-they were instructed not to guess. For the words that were positively identified, a follow-up comprehension quiz was given: participants were asked to pick one of two words that most closely resembled the index word. For instance, "sugar": sweet or sour. Finally, they were asked to explain the meaning of the main word and the investigator assessed whether the participant understood the meaning of the word, based on a definition checklist created from a reference dictionary. Pronunciation and recognition were scored for each word. From a methodological standpoint, if a participant hesitated or read the word slowly, s/he would be reminded that one should only read the words associated with dentistry that s/he knows the meaning of and not guess. If a participant was positive that s/he knew the word, this would be considered as 'correct' and we would proceed to the comprehension quiz. Of note, no such ambiguous events took place during our study, likely owing to the phonetic nature (i.e., there is a direct correlation between the spelling and the sound of each word) of the Greek language. Of note, the scoring of both pronunciation and recognition resembles closely the Spanish OHL instrument development [15] compared to the English version [14], which is only based upon pronunciation.

Analytical approach
Initial data analysis relied upon descriptive statistics (e.g., frequencies, proportions, means, standard deviations, medians, ranges), and bivariate analyses (e.g., Student's t test, ANOVA), reported using tabular and visual means. Item response theory analysis and 2-parameter logistic models assessing variance, difficulty, discriminatory ability and item-test correlations were used to identify an optimal scale composition. The scale's testretest reliability among twenty individuals over a twoweek period was measured using the intraclass correlation coefficient (ICC). Internal consistency was measured using Cronbach's alpha. Convergent validity was tested against functional health literacy screening (HLS) items, dental knowledge (DK), oral health behaviors (OHBs), OHRQoL (OHIP-14 index), as well as selfreported oral and general health status. Composite scores were computed and used for health literacy screeners (3 items; score ranging between 3 and 12; alpha = 0.65) and dental knowledge (16 items; score ranging between 0 and 16; alpha = 0.59). Spearman (rho) rank correlations between GROHL scores and other constructs or variables of interest were obtained. Pvalues are presented rounded to one significant digit [30]. P-values were not corrected for multiple testing and values less than 0.05 were considered significant. All analyses were done using Stata 16.0 (StataCorp LP, College Station, TX).

Results
The final list of items included in GROHL-20 is presented in Table 1 and the demographic information of the 282 participating individuals is presented in Table 2.
Briefly, the majority of participants, were women, married, with technical or university education, and were of mean age 39 years. Using these individuals' responses and departing from the initial 44 words, we first excluded 12 words due to no or insufficient invariance. Ten additional words were removed due to low (< 0.40) item-test correlation and two more for non-significant contribution (i.e., difficulty estimate > 0.05) to the scale (Additional file 1: Table S1). The remaining 20 words comprised the GROHL index. The GROHL's summary score distribution (Fig. 1) did not depart from normality (D'Agostino skewness and kurtosis test: p = 0.32; mean = 11.5; standard deviation = 4.0; median = 12; range = 1-20). The scale showed good internal consistency (alpha = 0.80) and excellent test-retest reliability (average ICC = 0.95; p < 0.0005). Overall, the GROHL score showed favorable distribution with most information and discriminatory potential demonstrated around the center and towards the low-end of the literacy construct (represented as theta; Figs. 2 and 3).  (Table 2), recent and more frequent routine dental visits (Table 3), as well as use of mouthwash (Table 4). Smaller differences were noted with regard to tooth brushing frequency and use of dental floss. Higher GROHL scores were associated with better health literacy screening item responses and their composite score (Fig. 4); for instance, needing help to read health information material in a hospital and understanding written oral health information (Table 5). We didn't note any  important association between OHRQoL measures (i.e., prevalence, "extent" and "severity" of impacts; Table 6). However, we found a significant positive association (rho = 0.30; p < 0.0005; Table 7) of GROHL with dental knowledge scores (Fig. 5).

Discussion
Here we report the development and psychometric properties of a Greek-language OHL measurement instrument (GROHL). The scale development and testing were done among nearly 300 adults and followed common procedures and practices. Overall, we found that the GROHL demonstrated favorable psychometric properties and we recommend its further application and evaluation in additional populations, in clinical and public health settings. Additional properties of interest that can be studied in the future include, among others, its responsiveness to change (i.e., after educational interventions), its association with oral health care-seeking (i.e., our sample was limited to dental care-seeking individuals), and its potential for further item reduction (i.e., the development of a short-form GROHL version). The scale's development departed from its wellestablished English-language counterparts (REALD-30 and REALD-99) and it includes an additional, comprehension component that enables a double-scoring method, based upon both word pronunciation and recognition. GROHL scores were normally distributed and were positively correlated with a wide array of variables and constructs, including education, dental knowledge and oral hygiene behaviors. Although comparisons with literacy estimates from other studies and populations cannot be directly made, the mean score of 12/20 in this dental clinic-recruited population is within the low-end of the theorized range, in comparison to scores of 16/30 in a community setting [14], 21/30 in a University clinic [28] and 24/30 in a private dental clinic setting in the U.S. [31]. Based upon the score and information distribution of index score in this sample, a preliminary expectation would be that GROHL scores below 9 may indicate 'insufficient' OHL; this arbitrary threshold will certainly need to be empirically verified and interpreted separately in each application context. Of note, the index also demonstrated good internal consistency, similar to other non-English language adaptations of REALD-30.
The development of GROHL was done among a comparatively large sample of almost 300 dental care-seeking adults using rigorous item response theory-based criteria  and subsequent psychometric evaluation. In fact, we used elements from both IRT and classical test theory (CTT), in a complementary manner [32]. For instance, we used IRT to determine the optimal set of items that we could carry forward and retain in our index based on their individual performance in terms of contributing information to the overall test score. Moreover, we conducted more 'classical' tests of scale reliability (i.e., testretest, internal consistency, etc.) and overall performance. These are routine in the psychometric evaluation of new scales and add to our understanding of the performance of GROHL. As expected, several words used in the REALD family of indices were not included in the GROHL-20 based on the IRT analysis; these 24 words did not add to the information content of the index for various reasons, most likely due to the differential meaning and pronunciation between the English and Greek languages. In other words, our iterative IRT approach resulted in a high-information content set of items, that contribute to the differentiation of test takers across a wide spectrum of OHL. The double-scoring method employed, similar to the Spanish-version of REALD-30 (OHLA) [15] accounts for both pronunciation and recognition, which is, in our opinion, essential for the Greek language context. Of note, alternative scoring possibilities for GROHL exist (i.e., giving partial credit for pronunciation if recognition or comprehension criteria are not met, construct composite or weighted scores, etc.)-these schemes were outside the scope of the work presented here, but are promising future research directions for possible refinement of the index. We found that GROHL scores were significantly positively correlated with overall educational attainment, dental-specific knowledge, oral health behaviors and attendance, as well as health literacy screening items. These findings should be interpreted with caution because dental-specific knowledge items and health literacy screeners, although used in earlier reports [4,10,29] were not validated or adapted in the context of our study. Nevertheless, these findings are aligned with earlier reports [6,7,13,14] and corroborate the validity of the index. The fact that we did not find an association with OHRQoL does not imply that one does not really exist-it is conceivable that a "U-shaped" association exists, wherein individuals at the lowest and the highest ends of health literacy both over-report quality of life impacts, each for different reasons; the former due to poor oral health and the latter due heightened awareness, or elevated standards and expectations. Some evidence exists suggesting under-reporting of OHRQoL impacts (by caregivers for their children) associated with low oral health literacy [18].
As is the case for all developmental investigations of this nature, some limitations exist. First, the sampled population was actively oral health care-seeking and in fact at a dental clinic. This limits the potential of the sample to represent the community-dwelling population, as arguably those who are already at a dental office may be systematically different (in many ways, including in terms of health literacy) compared to their non-careseeking counterparts. Further, word recognition and pronunciation are used here as proxies of functional health literacy, which pertains to actual skills (i.e., interpreting a prescription, recognizing signs of dental disease, following instructions to use a dental product or device, or performing oral hygiene tasks). These were not directly tested in this investigation. Arguably, tests of functional literacy are in principle superior because they assess actions or tasks of relevance to the domain of interest. For instance, the Hong Kong Oral Health Literacy Assessment Task for Pediatric Dentistry (HKOHLAT-P) [33] and the Test of Functional Health Literacy in Dentistry (TOFHLiD) [17] are based upon assessments of applied, oral health-related tasks and abilities, while word recognition and comprehensions tests serve as proxies of these task-performing abilities. The development of a functional (oral) health literacy instrument in the Greek language would be a welcome and likely necessary addition. In spite of this limitation, we support that the development and introduction GROHL is a step in the right direction--GROHL has favorable psychometric properties and was found to be significantly and positively associated with important health literacy screening questions that are demonstrative of daily, applied, healthcare-related skills. In sum, we foresee that the index will enable valid measurements of OHL, for the first time, in the Greek language.

Conclusions
The introduction of this new Greek language oral health literacy index fills a gap in the toolbox available for oral health outcomes research. This is especially important in the domain of oral health literacy, as the construct is rapidly gaining traction and relevance in health care   worldwide. Based on the results of this study, we support that the GROHL has good psychometric properties and can be used for outcomes research in clinical and public health settings. Further testing among non-clinical (i.e., not actively care-seeking) populations, as well as those living in rural areas will illuminate the performance of the index among diverse populations and settings.