Patient-reported outcome measures for masticatory function in adults: a systematic review

Objective The aim of this systematic review was to critically evaluate the Patient-Reported Outcome Measures (PROMs) for masticatory function in adults. Methods Five electronic databases (Medline, Embase, Web of Science Core Collection, CINAHL Plus and APA PsycINFO) were searched up to March 2021. Studies reporting development or validation of PROMs for masticatory function on adults were identified. Methodological quality of the included studies was evaluated using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) risk of bias checklist. Psychometric properties of the PROM in each included study were rated against the criteria for good measurement properties based on the COSMIN guideline. Results Twenty-three studies investigating 19 PROMs were included. Methodological qualities of these studies were diverse. Four types of PROMs were identified: questions using food items to assess masticatory function (13 PROMs), questions on chewing problems (3 PROMs), questions using both food items and chewing problems (2 PROMs) and a global question (1 PROM). Only a few of these PROMs, namely chewing function questionnaire-Chinese, Croatian or Albanian, food intake questionnaire-Japanese, new food intake questionnaire-Japanese, screening for masticatory disorders in older adults and perceived difficulty of chewing-Tanzania demonstrated high or moderate level of evidence in several psychometric properties. Conclusions Currently, there is no PROM for masticatory function in adults with high-level evidence for all psychometric properties. There are variations in the psychometric properties among the different reported PROMs. Trial Registration PROSPERO (CRD42020171591). Supplementary Information The online version contains supplementary material available at 10.1186/s12903-021-01949-7.


Background
Masticatory difficulty or masticatory problem is prevalent in older adults worldwide [1][2][3]. Masticatory function has been found to be associated with physical activity level, disability, comorbidities and cognitive status [4]. A recent consensus report classified the methodologies for assessment of masticatory function into three types, namely direct objective assessment, indirect assessment and subjective assessment [5]. In direct objective assessment, masticatory function is evaluated by assessing a test material either after a predetermined number of chewing strokes (masticatory performance) or at the moment when the study participant feels the urge to swallow (swallowing threshold). In indirect objective assessment, masticatory function is evaluated by jaw kinematics, muscle activity, Open Access *Correspondence: edward-lo@hku.hk Faculty of Dentistry, The University of Hong Kong, 1/F, Prince Philip Dental Hospital, 34 Hospital Road, Sai Ying Pun, Hong Kong, China tongue or lip function, and saliva secretion. In subjective assessment, self-assessment of masticatory function is evaluated using questionnaires and interviews. A recent systematic review reported that none of the established objective assessments of masticatory function had strong evidence for all measurement properties and these assessments required sieves or digital image software [6].
Self-assessment of masticatory function uses patient-reported outcome measures (PROMs), mainly questionnaires. This has the advantage of assessing masticatory function from the person's perspective, taking into account adaptational and psychological factors. Some studies found there were correlations between self-assessment and objective assessment of masticatory function [7][8][9]. However, other researchers reported a lack of agreement between the subjective and the objective assessments of masticatory function [10,11]. It should be noted that not all PROMs are created equal, and well-designed PROMs are needed to reveal the true masticatory function. Quality of the information collected and strength of the conclusion made depend on the properties of the instrument used in the study [12]. The methodological quality and psychometric properties of PROMs, such as content validity, structural validity, reliability, internal consistency and construct validity, are important aspects for the development or selection of a reliable and valid measurement tool [13,14]. It is important to use PROMs which have undergone rigorous psychometric testing to ensure the results obtained are valid and reliable. The COSMIN guideline was developed to enhance the quality of systematic review of PROMs [15][16][17][18].
In the past decade, a systematic review of all generic PROMs for adult dental patients [19] and a few systematic reviews of PROMs used in implant dentistry [20][21][22] were published. However, to the best of our knowledge, there is no systematic review of PROMs for assessing masticatory function. Therefore, based on the COSMIN guideline, this review aimed to identify PROMs that have been used in adults (population) for subjective assessment (type of instruments) of masticatory function (construct), and to evaluate the methodological qualities and psychometric properties (measurement properties of interest) of these PROMs.

Methods
This systematic review was registered in PROSPERO (Registration Number: CRD42020171591), and was reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 Checklist [23].

Search strategy
Medline (Pubmed), Embase (Ovid), Web of Science Core Collection, CINAHL Plus (EBSCOhost) and APA Psy-cINFO (ProQuest) were searched from their inception to March 2021. The search strategy consisted of three parts: (1) "chewing function/ability" or "masticatory function/ ability" or mastication; (2) questionnaire* or subjective* or evaluation* or assessment*; and (3) validation or validity or reliability or psychometric*. The detail search strategies can be found in Additional file 1: Part 1. As a supplement, manual search on the reference lists of published reviews and the included articles, and the Google Scholar was performed.

Eligibility criteria
Based on the COSMIN guideline, the study inclusion criteria in the present review were: (1) studies investigating the development or validation (measurement properties of interest) of subjective assessment (type of instruments) of masticatory function (construct), regardless of study design; (2) studies on adults (population); and (3) studies published in English with full text available.
The study exclusion criteria were: (1) studies which only included objective assessment of masticatory function; (2) studies that used subjective assessment of masticatory function as an outcome measure only; and (3) case studies, expert opinion, animal studies and reviews.

Study selection and data extraction
Articles retrieved from the electronic search were imported into the EndNote reference program (Ver. 9.3.1). After removing duplicates, two reviewers (YPF and XS) independently screened the titles and the abstracts of all identified records, and evaluated the full texts of all potentially eligible articles. The following data were extracted from the included articles: first author, year of publication, study participants, study setting, study design, study location, and the characteristics and psychometric properties of PROMs. Any disagreements between the two reviewers were resolved by discussion with an expert researcher (ECML).

Evaluation of the methodological quality of each study
Methodological quality of the included studies was evaluated using the COSMIN risk of bias checklist [15]. Following the COSMIN manual for systematic reviews of PROMs and the COSMIN methodology for evaluating content validity [16,18], all procedures were conducted by two reviewers (YPF and XS) independently. The COS-MIN risk of bias checklist included 10 aspects: PROM development, content validity, structural validity, internal consistency, cross-cultural validity/measurement invariance, reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness. The methodological quality of each aspect was assessed and rated on a 4-point scale: "very good" (V), "adequate" (A), "doubtful" (D), and "inadequate" (I).
The ratings were determined based on "the worst score counts" principle, i.e. the lowest rating for any item was the rating for the study [24].

Evaluation of the quality of psychometric properties
Psychometric properties of the PROM in each included study were rated against the criteria for good measurement properties [14,25]. Each property was rated as sufficient (+), insufficient (−) or indeterminate (?).
After assessing the quality of the psychometric properties of the PROM in the different studies, the quality of each psychometric property of the PROM was rated as sufficient (+), insufficient (−), inconsistent (±) or indeterminate (?). Finally, the level of evidence of each psychometric property of that particular PROM was graded as "high", "moderate", "low" or "very low", using a modified Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach recommended in the COMSIN guideline [16]. According to the COS-MIN guideline, publication bias was not considered when using the modified GRADE to evaluate the measurement properties of PROMs, and only the following four factors were evaluated: risk of bias, inconsistency, imprecision and indirectness.

Study selection
After removal of duplicates, 1850 records were identified (Fig. 1). The titles and abstracts of these records were screened, and 1816 records were excluded. Full texts of 34 articles were assessed and 22 articles were excluded with reasons. Eleven articles were included through the supplementary search (Additional file 1: Part 2). Finally, 23 articles met the inclusion and exclusion criteria.

Characteristics of the included studies and PROMs
Summary of the 23 included articles reporting on 23 studies and 19 PROMs is presented in  [39][40][41], (3) questions related to chewing specific food items and questions related to chewing problems (three studies, two PROMs) [42][43][44], and (4) a global question (two studies, one PROM) [45,46]. The number of questions in each PROM ranged from seven to 35. Most of the PROMs were unidimensional, except the Persian version of the quality of masticatory function questionnaire (QMFQ-Persian) which contained five domains [42]. Variations were found in the response options of the included PROMs. For the three PROMs with questions about chewing problems, one adopted a five-point Likert-scale ("always", "often", "occasionally", "rarely" and "never") [40], one provided three choices ("No", "Yes-sometimes" and "Yes-always") [41], while the other accepted different responses for different questions [39]. For the PROMs containing questions about chewing specific food items and questions about chewing problems, all adopted a five-point Likert scale response option [42][43][44]. For the PROM containing only one global question, the response choices were "Yes, I can bite tightly on both sides", "Yes, but only on one side" and "No, I cannot bite on either side" [45,46]. For the PROMs with questions about chewing specific food items, the response options were based on level of difficulties with slight variations.

Methodological quality of each study
An overview of the methodological quality assessment of the included studies is presented in Table 2. Nearly all (21 out of 23) studies had conducted hypothesis testing for structural validity and their methodological qualities were rated as adequate or very good. Most of the studies that evaluated internal consistency were rated as doubtful because information on structural validity or unidimensionality of PROMs was not presented [26-28, 32, 34, 35, 38, 40]. For the studies that evaluated structural validity, most studies were rated as adequate or very good, except for the study reporting on QMFQ-Persian, which was rated as inadequate due to insufficient sample size [42]. Of the three studies that evaluated criterion validity, two studies were rated as very good [8,27], and one was rated as inadequate [37]. Only three studies evaluated responsiveness and their methodological qualities were rated as very good [29,43,44]. Regarding cross-cultural validity, two studies were rated as doubtful [42,43]. Among the six studies which had evaluated content validity, five were rated as doubtful [35,[41][42][43][44] while one study was rated as inadequate [30]. Regarding PROM development, six studies described the development process and the methodological qualities of all these studies were rated as doubtful [30,31,35,38,41,44].

Quality of psychometric properties
Psychometric properties of the PROMs in the individual studies are presented in Table 3. The details can be found in Additional file 1: Part 3. Internal consistency was evaluated in 14 studies, and nine of them were rated as indeterminate, because these studies did not meet the criteria "at least low evidence for sufficient structural validity". Eight studies evaluated test-retest reliability and seven of them were rated as sufficient. Content validity was evaluated in six studies and four of them were rated as sufficient. Eight studies evaluated structural validity and seven of them were rated as sufficient. Only three studies reported criterion validity and two of them were rated as sufficient. Twelve of the 21 studies that had conducted hypothesis testing for construct validity were rated as sufficient. All of the three studies that evaluated responsiveness were rated as sufficient, because the standardized effect size was higher than expected and the results were in accordance with the hypothesis. Two studies performed cross-cultural translation, and both were rated as indeterminate.

Evidence synthesis
Summarized evidence of the included PROMs is presented in Table 4. The levels of evidence differed amongst the various psychometric properties of the PROMs. Chewing Function Questionnaires-Croatian or Albanian (CFQ-Croatian or Albanian) had a moderate or high level of evidence for internal consistency, test-retest reliability, structural validity, hypothesis testing for construct validity and responsiveness, and these psychometric properties were all rated as sufficient [43,44]. Food Intake Questionnaire-Japanese (FIQ-Japanese) had moderate level of evidence for sufficient hypothesis testing for construct validity [7,33]. New Food Intake Questionnaire-Japanese (New-FIQ-Japanese) had a moderate or high level of evidence for structural validity, criterion validity and hypothesis testing for construct validity, and these psychometric properties were all rated as sufficient [8]. Perceived Difficulty of Chewing-Tanzania (PDC-Tanzania) had moderate level of evidence for sufficient hypothesis testing for construct validity and low level of evidence for indeterminate internal consistency [34]. Chewing Function Questionnaire-Chinese (CFQ-Chinese) had high level of evidence for internal consistency, and moderate level of evidence for test-retest reliability, structural validity and hypothesis testing for construct validity, and these psychometric properties were all rated as sufficient [30]. Screening for Masticatory Disorders in Older Adults (SMDOA) had high level of evidence for structural validity and hypothesis testing for  [31] 0 FIAQ-key food [37] [46] 0 0 0 0 0 A 0 0 0 construct validity, and low level of evidence for content validity, and these psychometric properties were all rated as sufficient [41].

Discussion
This review yielded two major findings: (1) although 19 PROMs for masticatory function were identified, none of them had high-level evidence for all of the sufficient psychometric properties; and (2) CFQ (Croatian or Albanian), FIQ-Japanese, new-FIQ-Japanese, CFQ-Chinese, SMDOA and PDC-Tanzania have better psychometric properties than the other PROMs.

Comparison with previous reviews
There is a recent consensus report on the assessment of masticatory function [5] in the literature. In the consensus report [5], five PROMs for masticatory function were mentioned, among which four were included in the present review. The PROM that was not included in the present review was an instrument containing three questions based on the international classification of functioning, disability and health (ICF) model for oral function [47]. The reason for not including this PROM is the development or validation of the PROM was not reported in the literature.

Table 3 Psychometric properties of the included PROMs
The hypothesis for evaluating convergent validity was if a correlation between the PROM under study and the comparator instrument measuring the similar construct was ≥ 0.50, it was considered as sufficient [60]. The hypothesis testing for evaluating discriminant validity and responsiveness were in accordance with that in individual studies  In a recent systematic review of PROMs for adult dental patients [19], only two out the 20 questionnaires were on masticatory function and they were included in the present review. There were three other questionnaires included in that systematic review but they were not included in the present review because they focused on jaw function and not masticatory function.
Since the methodological quality and psychometric properties of PROMs for masticatory function have not been reported in previous systematic reviews, no comparison regarding the findings from the present review and those of earlier reviews can be made.

Recommendations on methodology and psychometric property for future research
In the present review, only six studies described the PROM development process and this was only briefly presented [30,31,35,38,41,44]. It is hard to tell whether the PROM development process had not been properly carried out or was just not reported. Detailed information about the PROMs development process should be described in future research.
None of the 23 studies included in the present review tested the measurement errors. Measurement error is defined as "the systematic and random error of a study participant's score that is not attributed to true changes in the construct to be measured" [48]. The measurement error will be rated as sufficient if the minimal important change (MIC) is larger than the smallest detectable change (SDC), or MIC is outside the limits of agreement (LOA). A PROM can be used to compare masticatory function of different people or the same person at different time points. The difference between two scores may originate from the measurement error or the real difference/change. Lack of assessment of the measurement error may affect judgment. Thus, the measurement error of PROMs should be evaluated in future studies in order to obtain accurate results and to draw valid conclusions.
The present review found that only two studies evaluated the cross-cultural validity of the PROMs but they only conducted forward-backward translation [42,43]. It is not sufficient for the evaluation of cross-cultural validity by merely performing forward-backward translation or by conducting a pilot study on a sample with a different culture without carrying out proper statistical analysis. To assess cross-cultural validity of PROMs in future studies, regression analyses or confirmatory factor analysis (CFA) using classical test theory (CTT) methods, and differential item functioning (DIF) analyses using item response theory (IRT) methods are recommended [49][50][51][52].
Responsiveness is defined as "the ability to detect clinically important change" or as "the ability to detect a change in the construct to be measured" [48]. In the present review, there were only three longitudinal studies, two on CFQ (Croatian or Albanian) and one study on the Index of Chewing Ability (ICA, 2020), which evaluated the responsiveness of the PROMs through the change scores collected before and after prosthodontic treatment [29,43,44]. The responsiveness of the other 17 PROMs was not studied and they may not be able to detect changes in masticatory function. Therefore, further studies should be conducted to evaluate the responsiveness of PROMs. In addition, the time span needed to capture the score difference in different populations, e.g. young adults and older adults, should be taken into consideration when designing such studies. The effect sizes (mean change score/SD baseline) [53], standardized response mean (mean change score/SD change score) [54], Norman's responsiveness coefficient (σ 2 change/ σ 2 change + σ 2 error) [55], and relative efficacy statistics ((t-statistic 1 /t-statistic 2 ) 2 ) [56] are appropriate statistical methods to evaluate responsiveness. In contrast, use of paired t-test is not appropriate for this purpose [57].
Content validity was only evaluated in five of the 19 PROMs included in the present review, and none of these PROMs have high level of evidence on content validity. It is worth emphasizing that content validity, defined as the degree to which the content of a PROM is an adequate reflection of the construct to be measured [48], is widely regarded as the most important type of validity for PROMs [58]. Asking study participants about the relevance, comprehensiveness and comprehensibility of

Recommendations on the selection of appropriate PROMs for future research
The PROMs included in this review were put into three categories based on the COSMIN manual [16]. Category A includes PROMs with evidence for sufficient content validity (any level) and at least low quality evidence for sufficient internal consistency. Category C includes PROMs with high quality evidence for an insufficient measurement property. PROMs which cannot be categorized as either A or C are put into category B. The PROMs categorized as "A" are recommended for use while those categorized as "C" are not recommended. PROMs categorized as "B" have potentials to be recommended, but further studies are needed to assess their qualities [16].
Results of the present review show that CFQ (Croatian or Albanian) [43,44] can meet the inclusion criteria of category A, while Subset of the Oral Health Impact Profile (Subset-OHIP) [40] and QMFQ [42] are in category C. The other PROMs are categorized into B. Thus, only CFQ (Croatian or Albanian) is recommended for use. The PROMs in category B can be further divided into two sub-categories according to the rating of hypothesis testing for construct validity. Hypothesis testing for construct validity refers to the extent of subjective assessment related to other measures that are consistent with theoretical measurement construct [25,59]. If the hypothesis testing of a PROM is rated as sufficient, it can be re-classified into category B1 and has the potential to be recommended for use. Otherwise, if the hypothesis testing of a PROM is rated as insufficient or indeterminate, it will be categorized into B2 and will need further research to assess its quality. Results of the present review show that FIQ-Japanese [7,33], new-FIQ-Japanese [8], PDC-Tanzania [34], CFQ-Chinese [30], and SMDOA [41] can be classified into category B1, while ICA-1990 [32], PDC-Sudan [26], Index of Eating Difficulty (IED) [35], CFQ-Japanese [36], Food Intake Ability Questionnaire (FIAQ) [37], Masticatory Ability assessment for the Community-dwelling Elderly (MACE) [27] and Masticatory Problem Index (MPI) [39] are in the category B2. Although, Food Intake Ability Questionnaire-key food version (FIAQ-key food version) [37] and Selfreported Masticatory Function (SMF) [45,46] may be classified as B1, these two PROMs need more research to fully assess their quality. It is difficult to classify FIQ-Chinese [28,38] into B1 or B2 because the convergent validity was rated as insufficient though its discriminative validity was rated as sufficient. Further studies on this PROM are needed.
In addition to the above-mentioned measurement properties, feasibility and interpretability of PROMs should also be considered when making recommendations for use [17]. Feasibility refers to the ease of PROM application, such as completion time and cost, while interpretability refers to the relationship between PROM scores and clinical meaning [48]. Considering all the evaluated properties of the PROMs included in the present review, CFQ (Croatian or Albanian) is recommended for use, and FIQ-Japanese, new-FIQ-Japanese, PDC-Tanzania, CFQ-Chinese and SMDOA have the potential to be recommended for use.
Based on the results of this systematic review, none of the included PROMs can be considered as the "gold standard". Nevertheless, some PROMs have better psychometric properties than others, and may be suitable for certain populations. Specially, CFQ (Croatian or Albanian) is recommended to be used to assess masticatory function of general prosthodontic patients. FIQ-Japanese and New-FIQ-Japanese may be recommended to assess masticatory function of complete denture wearers. SMDOA, CFQ-Chinese and PDC-Tanzania may be recommended to assess masticatory function of community-dwelling adults in epidemiological screening.
To the best of our knowledge, this is the first systematic review on PROMs for masticatory function based on the COSMIN guideline. In addition, this review provides recommendations for the selection of appropriate PROMs for masticatory function. Moreover, this review points out the commonly neglected methodological aspects among the included studies and provides suggestions for future research. Regarding the limitations of the present review, only articles published in English were included and this may result in omission of potentially excellent PROMs reported in articles published in non-English languages. Besides, the PROM development or validation processes may have been rigorously implemented in some studies but were not reported in detail, which may lead to a downgrade of their methodological quality ratings. It is strongly recommended that future studies refer to the COSMIN guideline when developing or validating PROMs.

Conclusions
Currently, there is no PROM for masticatory function in adults with high-level evidence on all the psychometric properties. There are variations in the psychometric