Indonesian Version of WHO-5 Well-being Index Amidst COVID-19 Pandemic Settings: Scale Validation and Standardisation

The 5-item World Health Organization Well-Being Index (WHO-5) is a frequently used brief standard measure in large-scale clinical studies. However, no research specifically on the validity test of WHO-5 was found in Indonesia before the pandemic. This study aims to test the validity of the Indonesian version of the 5-item WHO-5 in the COVID-19 pandemic setting. The online survey was used to collect data from February 2021 to September 2021. 1,084 Indonesians who were directly or indirectly affected by COVID-19 completed the survey. The scale was validated with Confirmatory Factor Analysis (CFA) and Item Response Theory (IRT). The CFA and IRT analysis showed that the WHO-5 is valid. Thus, the WHO-5 is a short questionnaire that consists of five non-invasive questions adequate to measure the psychological well-being of the Indonesian sample, especially during the COVID-19 Pandemic, and could be used in other pandemics or crises.


Introduction
The increasing number of infections and death rates due to COVID-19 can cause collective anxiety about COVID-19 infection (Bento et al., 2020).Increased infection rates and their association with deteriorating mood and emotions result in decreased psychological well-being (Bathina et al., 2021).The World Health Organization (WHO) considers psychological well-being one of the essential aspects of health and an essential dimension of perceived quality of life (McDowell, 2010).Thus, it is also essential to understand how to measure psychological well-being using a valid and reliable instrument in the population to plan an effective intervention.One of the most used instruments to measure psychological well-being is the 5-item WHO Well-Being Index (WHO-5).
The WHO-5 was first introduced at the WHO meeting in Stockholm in February 1998 for the project on well-being measurement, and the WHO Regional Office initiated the translation into several languages (Staehr, 1998).The WHO-5 is derived from the WHO-10, first derived from a 28-item rating scale used in a WHO study in eight European countries (Warr et al., 1985;Staehr, 1989).The 28 items in the original scale are adapted from the Zung scales (for depression, distress, and anxiety), the General Health Questionnaire, and the Psychological General Well-Being Scale (Paisley, 2000).Thus, both the 28-item scale and the WHO-10 consists of item that measures distress both in favourable and unfavourable forms.In contrast, since the WHO now considers well-being in a positive light and sense, the WHO-5 only includes positively phrased items (Bech, 1999).
This WHO-5 well-being index is translated into at least 30 languages, for example, Arabic, Chinese, Filipino, etc. (Topp et al., 2015).Despite being translated into multiple languages, not all of these translation versions have been validated, including the Indonesian version with a large sample of Indonesian.This study aims to provide a validity test of WHO-5 from the Indonesian language and its psychometric properties.

Psychological Well-being Amidst COVID-19
The devastation caused by the COVID-19 pandemic affects the growth rate of mortality, rate of infection, and other health hazard issues and the decline of psychological well-being among people all around the world, especially Indonesian people (Bathina et al., 2021).Psychological well-being is related to physical, mental, and socio-cultural aspects and spirituality (Bożek et al., 2020;Kumar, 2020).The pandemic might not have a significant impact if every individual has immunity both in physical and mental power.Therefore, it is vital to create an order in which society can achieve good psychological well-being to be resilient in the face of a pandemic.
Psychological well-being emphasizes how and why a person lives life in positive ways, including cognitive judgments and affective reactions.It includes studies that have used various aspects such as happiness, satisfaction, morale, and positive influence (Diener, 2009).The main goals of the state, society, and people are to understand and accept that human well-being is a fundamental, foundational, basic, and indispensable condition of a healthy society and its successful development and prosperity.Ryff and Keyes (1995) define psychological well-being as an encouragement to explore the individual's potential as a whole.This can cause a person to become resigned to a situation that decreases the psychological well-being of individuals or try to improve living conditions that will increase the psychological well-being of the individual.Individuals who have high psychological well-being are individuals who are satisfied with their lives, are in positive emotional states, can go through bad experiences that can cause negative emotions, have positive relationships with others and can determine their destiny without depending on others, control environmental conditions, have a clear purpose in life and can self-develop (Ryff, 1989).The decline of psychological well-being that is impacted mainly by the COVID-19 pandemic has become one of the priorities of the issues that need solving.The first step is to understand the accurate level of psychological well-being by measuring it using specific psychological tools.One of the most accessible screening tools used to measure psychological well-being using simple, non-invasive items is the 5-item World Health Organization Well-Being Index, usually known as WHO-5.

WHO-5, Adaptation Indonesian Version
Among various assessments of psychological well-being, the 5-item WHO-5 is one of the most employed measuring instruments of psychological well-being across the world.This unidimensional WHO-5 allows respondents to complete the questionnaire in under one minute.This instrument measures the psychological well-being of an individual in the past 14 days or two weeks.
The WHO-5 is generally used to assess clinical outcomes in clinical trials and adequately measure responsiveness/sensitivity to treatment (Topp et al., 2015).The WHO-5 was initially developed to measure well-being.Still, several studies also use it as a screening tool for depression because the items represent aspects closely related to depression, such as lack of positive emotions, interests, and energy (Krieger et al., 2014).The WHO-5 is also used mainly in multinational studies in various study fields, such as health fields related to diabetes (Nicolucci et al., 2013) and quality of life in the real coronavirus crisis (Ahrendt et al., 2020).

Prior Validity and Reliability Test of WHO-5
The WHO-5 was generally used in public health studies, such as research on diabetes, heart diseases, and depression before the COVID-19 pandemic (Hindoro et al., 2018;Larasati & Kristina, 2020;Soewondo et al., 2010).However, no research specifically on the validity test of WHO-5 was found in Indonesia before the pandemic.The Indonesian researchers conducted studies using the WHO-5 as one of the scales that measured well-being with other psychological and non-psychological variables.Most prior studies using WHO-5 as a measuring instrument only examine the instrument's reliability by calculating Cronbach's alpha and one-dimensionality using exploratory factor analysis or principal component analysis.Others also test the external criterion.However, the methods mentioned above are not sufficient to test the validity and reliability of WHO-5.Validity is achieved when the instrument truly measures the intended construct (Comrey, 1973;Jöreskog, 1969).One of the best, more informative construct validity tests is Confirmatory Factor Analysis (CFA); thus, in this study, CFA is used as a construct validity test of WHO-5.Furthermore, the Item Response Theory is used to calibrate and evaluate items in the questionnaire.

Confirmatory Factor Analysis (CFA)
Confirmatory Factor Analysis represents the relationship between observed variables and the latent variables (Brown, 2015).Based on Brown (2015), CFA can be used to analyse the construct validity of a variable.It provides evidence of the validity of theoretical constructs.

Item Response Theory Model
The Item Response Theory (IRT) is an essential psychometric property for item analysis (Ostini et al., 2015).The WHO-5 employs scoring from 0 (absence of well-being) to 25 (maximal well-being) (WHO, 1998).The scoring requires an equal discrimination parameter (IRT 1 parameter).However, the total score may not be enough to show the extent of one's psychological well-being.Thus, the IRT model may be more sufficient.

Standard Setting with Rasch Model
The standard setting uses a systematic approach that identifies a common agreement among experts or raters to cut score(s) or threshold(s) for a certain level of proficiency and/or traits (MacCann & Gordon, 2019).The author uses item mapping to decide the cut scores according to the item difficulties or item locations (Wang, 2003;MacCann & Gordon, 2019).The next step is the author uses a convenient method (World Health Organization, 2009;Bonacchi et al., 2021) to classify people into two categories as follows: if the logit scale falls less than 0.00, then it would indicate an absence of wellbeing (<.00 logit scale "absence of well-being"); and if the logit scale falls more than equal .00, it indicates high well-being (≥ .00logit scale "high well-being").

Survey Design and Participants
This study was part of a more extensive study on screening instrument development, for which ethics approval was provided by the University of Macau (Reference numbers: SSHRE20-APP020-FSS, EA210291).The online survey was distributed from February to September 2021, when the COVID-19 pandemic was at its peak using the Survey Monkey platform.The completion time for the WHO-5 part was approximately one minute.Non-purposive sampling method was used by distributing the link to the survey to the Indonesian population who were at least 18 years old and affected directly or indirectly by COVID-19.The survey assessed and quantified the psychological well-being of participants, characterised by (1) being exposed to the negative impact of COVID-19 directly and (2) being exposed to the negative impact of COVID-19 indirectly through family, friends, and the surrounding environment.A total of 1,084 participants completed the WHO-5 scale.Thus, no data was excluded for being incomplete or missing.These 1,084 respondents were predominantly females (72.1%) with ages ranging from 18 to 64 years (M = 24.1,SD = 8.4) and located in 26 of 34 provinces in Indonesia (see Table 2 for further sample characteristics detail).
The demographic information presented in this study is gender and age.The gender is coded into three codes: 0 = self-described, 1 = female, and 2 = male.The age is coded into three codes: 1 = early adulthood (18-34), 2 = mid-adulthood (35-44), and 3 = late adulthood (>45).The characteristics of the sample are shown in Table 2 below.The majority of respondents are female (72.1%), and the minority chose to selfdescribe (1.9%), while males made up 25.9% of the sample.The majority of respondents are people in their early adulthood (88.5%), and a minority of respondents are in late adulthood (3.5%), while people in mid-adulthood made up 8% of the sample.(Simon et al., 2021).

Instruments
The English version of WHO-5 was translated into the Indonesian language (Appendix A) using forward and backward translation techniques to ensure the clarity and consistency of each item's meaning (World Health Organization, 2009).The participants were asked to rate how often the statement applies to them considering the last 14 days by choosing a Likert scale ranging including 1 (at no time), 2 (some of the time), 3 (less than half the time), 4 (more than half the time), 5 (most of the time), and 6 (all of the time).

Procedure
The researchers translated the WHO-5 into the Indonesian language as this study is aimed at the Indonesian sample.The process of translation was conducted through forward translation and backward translation to ensure the accuracy of the translated items.First, translation is done from the English version to the Indonesian version by an expert in psychology who has the English language ability to translate.Then, the Indonesian draft scale is translated into English by an expert with the same criteria.A backtranslation draft is discussed on the equality of meaning per item by foreign experts in psychology and the use of English as a daily language.The Indonesian draft is checked again for its appropriateness in meaning.

Data Analysis
The collected data were analysed with Confirmatory Factor Analysis (CFA) and Item Response Theory (IRT).A CFA was conducted to examine the uni-dimensionality of the WHO-5 using Lisrel software.A CFA was conducted through several steps (Comrey, 1973;Jöreskog, 1969): 1. Determining the operational definition of the measured construct.To measure the construct, an item (stimulus) is needed as the indicator; 2. Formulating a hypothesis where all items are valid in measuring the construct.In other words, building Ha's hypothesis based on uni-dimensionality where there is only one factor that is measured by all items; 3. The correlation matrix between items, known as the S matrix, is calculated, which is used to estimate the correlation matrix; 4. The construct validity is tested through a hypothesis test by checking the Chi-Square test and RMSEA.
The Chi-Square showing an insignificant value (p > .05)means that the null hypothesis (H0) is not rejected.This indicates that the theory that says all items only measure one construct alone is proved to be in accordance with the data.Meanwhile, RMSEA showing a significant value (<. that the null hypothesis (H0) is not rejected.This indicates that the theory that says all items only measure one construct alone is proved to be in accordance with the data; 5.If a unidimensional (one-factor) model is proven to fit the data, then the selection or evaluation of items can be made using three criteria: a. Items with insignificant factor loading (>.05) are dropped as they do not provide statistically meaningful information; Items with a negative factor loading coefficient are also dropped as they measure the opposite of the defined construct.Researchers should check which item statement is unfavourable; the score is reversed.This applies especially to items with no right or wrong answer (e.g., personality, motivation, perception, etc.); b.Items can also be eliminated if the residual (measurement error) correlates with more than three items; this indicates that the item measures construct other than the construct to be measured.
Uni-dimensionality indicates that the measuring instrument only measures one construct, in this case, psychological well-being.After the construct validity test shows a fit unidimensional model by checking the RMSEA as an index of fit, IRT is conducted as the construction of measurement instruments, linking and equating measurements, and evaluation of test bias and differential item functioning.IRT is an essential aspect of instrument measurement.Therefore, this study employed IRT on WHO-5.The IRT is done using MPlus software.

Results
Researchers present an additional means of comparison for both demographic variables, namely gender and age.Mean comparison is intended for checking the level of psychological well-being across gender and age.This data comparison is additional information on the difference in psychological wellbeing between female, male, and self-described individuals and people in their early adulthood, midadulthood, and late adulthood.The means comparison is shown in Table 3  The first model of Confirmatory Factor Analysis (CFA) of the 5-item World Health Organization Well-Being Index (WHO-5) is shown in Figure 1.The results in Figure 1 shows the Chi-Square = 91.58,df = 5, p < .001,and RMSEA = 0.126.The fit indexes indicate that the model does not fit with the data (Jöreskog et al., 2016).Therefore, the model is modified by allowing the item's measurement errors to correlate or the theta-delta to correlate in order to reach the fit model.Based on the modification indices in Lisrel, the modification made is as follows Figure 2.   The results of Confirmatory Factor Analysis obtained model fit after three modifications by allowing measurement error between items to correlate.Furthermore, researchers used Phi standardisation for standardising techniques.After obtaining the fit model for WHO-5, the items are tested to determine the significance and validity to check whether some items must be dropped or not.Table 6 shows the result of the significance test to consider item dropping.There are three main requirements to declare if an item is valid: (1) favourable factor loading, (2) t-value > 1.96, and (3) less than three modifications for an item.According to that table, the significance test results of all items of WHO-5 show favourable factor loading or lambda for all items; this indicates that all items measure the intended construct.Furthermore, all t-values are more than 1.96; all p-values are 0.05, meaning that all items are significant in measuring psychological well-being.The final verdict is that all items are valid based on evaluating factor loading, t-value, and p-value of each item constructing the 5-item World Health Organization Well-Being Index (WHO-5).In conclusion, the construct validity test using Confirmatory Factor Analysis shows a fit unidimensional model, meaning that all items of WHO-5 only measure one construct: psychological well-being.

Item Response Theory
Item Response Theory analysis shows the result of the p-value of the Chi-Square test of the model for the binary and ordered categorical (ordinal) outcomes to be significant (p-value > 0.05), in which case the p-value of Chi-Square is 1.000.Thus, based on the p-value of the Chi-Square test of the model for the binary and ordinal outcomes, the model is fit.The Loglikelihood for the model is -6,770.245,which shows a fit model because the Loglikelihood is less than 10,000.
Furthermore, the IRT parameterisation results show two pieces of information, namely item discriminations and item locations.Table 7 shows the results of item discriminations, and Table 8 shows the item locations.Item discrimination is the ability of each item to discriminate or differentiate among respondents based on how high or low they are in the context of the construct measured, in this case, psychological wellbeing.Table 7 shows that the item discrimination results are all significant (p-value < 0.05), with all estimates being optimistic, further supporting the validity of each item.Furthermore, all the discrimination indexes are above .60.In conclusion, all items have good item discrimination.Item location is also known as item difficulty, which estimates how high they must possess a particular ability or psychological construct, in this case, psychological well-being, to pick the correct answer.Table 8 shows that all estimates of item difficulty are typical in which all values are ideal as they should range from -2 to 2. In this study, all range from -0.072 to 0.272.In conclusion, item W1 is the most accessible item with an estimate of -0.072, and W4 is the most complex item with an estimate of 0.272.
To further analyse items using Item Response Theory, researchers present the Item Characteristic Curve (ICC) for all items with the range of -3 SD to +3 SD.The item characteristic curve (ICC) is shown in Figure 3.According to the diagram, the further to the left the items are, the easier they are, and the further to the right, the more difficult they are (Muthén & Muthén, n.d.).According to Figure 3 The items in WHO-5 tend to be difficult as all of the curves are on the right side of the diagram.This further shows that most respondents have low psychological well-being amidst the pandemic.Table 3 shows that males have the highest mean of psychological well-being, meaning that, on average, males have higher psychological well-being compared to females and self-described people.Selfdescribed people have the lowest mean of psychological well-being, indicating that, on average, they have lower psychological well-being compared to females and males.
Late adulthood has the highest mean of psychological well-being in the age group, meaning that, on average, people in late adulthood have higher psychological well-being than people in their early and middle adulthood (Table 3).Early adulthood has the lowest mean of psychological well-being.On average, people in early adulthood have lower psychological well-being than those in mid and late adulthood.

Standard Setting
As mentioned earlier, the author uses Rasch's standard setting to classify people on the scale of wellbeing.The results of the standard setting can be seen in Table 8 below.According to Rasch's analysis, the logit scale of 0.00 was approximately equal to 13 of the total score.Therefore, those obtained scores below 13 can be as absence of well-being and for those more than equal to 13 belongs to high well-being.Using these categorizations, there are 620 (57.3%) participants belongs to absence of well-being, and 464 (42.7%) participants fall into high well-being.

Discussion
Psychological well-being emphasizes how and why a person lives life in positive ways, including cognitive judgments and affective reactions.It includes studies that have used various aspects such as happiness, satisfaction, morale, and positive influence (Diener, 2009).The main goals of the state, society, and people are to understand and accept that human well-being is fundamental, foundations, basic premises, and indispensable conditions of a healthy society and its successful development and prosperity (Alatartseva & Barysheva, 2015).
The importance of psychological well-being, especially amidst the COVID-19 pandemic, encourages studies on developing the measuring instrument of psychological well-being.Thus, this study aims to validate and test the WHO-5 using Item Response Theory (IRT) as the construction of measurement instruments, linking and equating measurements, and evaluation of test bias and differential item functioning.First, the Confirmatory Factor Analysis (CFA) was administered.The results fit the model after three modifications by allowing the measurement error of items to correlate with each other (allowing for thetadelta to correlate using LISREL) (Chi-Square = 0.32, df = 2, p = .85and RMSEA = 0.000).The significance test of each item also shows that all items are valid in measuring the construct of psychological well-being.The validity is shown through positive factor loadings and t-values > 1.96 which consequently shows pvalue = 0.05.The results concluded that the WHO-5, a unidimensional instrument, consists of five useful items measuring psychological well-being.Prior studies among hypertension patients and diabetes mellitus patients also found similar validity results stating that the WHO-5 is a valid unidimensional scale (Hindoro et al., 2018;Larasati & Kristina, 2020;Soewondo et al., 2010).
Second, the Item Response Theory (IRT) analysis shows that item parameterization resulted in a fit model for item discrimination and difficulty.All estimates for item discriminations are optimistic.All items can differentiate among respondents based on how high or low they are in the context of the construct measured, in this case, psychological well-being.Item W3 has the strongest discrimination among the other items, with the highest estimate of 2.68.Item W1 is the most accessible item with an estimate of -0.07, and W4 is the most difficult to answer item with an estimate of 0.272, meaning the easier the items are, the more likely respondents will choose the correct answer and vice versa.There is no right or wrong answer in the WHO-5 as the scale itself does not measure ability; thus, it is a non-ability scale measuring wellbeing.All items have a standard range of estimates.Furthermore, Item Characteristic Curve (ICC) shows that all items of WHO-5 tend to be further on the right side of the diagram, meaning that all items tend to be complicated.This further shows that most respondents have low psychological well-being amidst the pandemic, as shown in previous studies on the respondents of research on public health before COVID-19 (Hindoro et al., 2018;Larasati & Kristina, 2020;Soewondo et al., 2010).
Findings from this validation study might be different when compared with previous validation studies conducted before the COVID-19 pandemic settings, for example, among elderly (Heun et al., 2001), employee in the 6th European Working Condition survey in 2015 (Sischka et al., 2020), and university students in China in 2018-2019 (Fung et al., 2022)

Conclusion
In conclusion, WHO-5 is a psychometrically fit and valid brief measure that consists of simple and non-invasive questions measuring the psychological well-being of respondents.The present study suggests that future studies implement advanced psychometric and statistical measures to validate the instrument.The study's strength is the large sample size throughout Indonesia, meaning that the sample may represent the actual condition amidst the COVID-19 pandemic.The limitations of this study are that since the number of items on the WHO-5 scale is not enough for Rasch Model analysis, a further advanced analysis could not be conducted, and the majority of early adulthood participants might be because of the online method used, which could increase the self-selection of the bias.

Figure 2 .
Figure 2. Path diagram of WHO-5 with Modification

Table 1 .
ParticipantsFurthermore, the raw data is analysed to obtain the mean and standard deviation for age, as shown in http://journal.uinjkt.ac.id/index.php/jp3iThis is an open access article under CC-BY-SA license (https://creativecommons.org/licenses/by-sa/4.0/)

Table 2 below : Table 2 .
(Seb-Akahomen et al., 2021)The minimum age of respondents is 16 years old, and the maximum age is 71 years old.The mean age shows a value of 24.15 with a standard deviation of 8.37.In other studies, during the COVID-19 pandemic, the mean age varied.The well-being index among nurses in Spain, Chile, and Norway ranges from 36 to 48 years old, with a mean age of 39.3 years and an SD of 12.1(Lara-Cabrera et al., 2022).A Nigerian study showed that doctors and nurses with a mean age of 39.85 and SD 8.49 have ages ranging from 21 to 63 years(Seb-Akahomen et al., 2021).Moreover, in the Austrian survey, the mean age is 40.22 years with SD 11.60, and the range age is from 18 to 79 years old

Table 4 .
Participants' Demographic of Provinces

Table 5 .
Model Comparison

Table 6 .
Significance Test

Table 8 .
Item Location

Table 9 .
Rasch Standard Setting