Academic Editor:Adamu Ahmed, Department of Surgery, Ahmadu Bello University Teaching Hospital, Zaria, Nigeria.
Checked for plagiarism: Yes
Review by: Single-blind
The construct Validity of the CES-D among HIV-Infected Perinatal Thai Women: Explanatory and Confirmatory Factor Analysis
It is important to measure depressive symptoms in HIV-infected individuals because depressive symptoms have been found to be correlated with faster progression to AIDS. Worldwide, the CES-D has been used to assess depressive symptoms and examined for its construct validity. However, no previous studies have investigated the CES-D’s construct validity among HIV-infected perinatal women. Therefore, the objective of this study was to examine the construct validity of the CES-D using both explanatory and confirmatory factor analysis among HIV-positive perinatal women in Thailand. Results showed that, overall, the CES-D is a 4-factor instrument with good construct validity and can be used to evaluate depressive symptoms among HIV-positive perinatal Thai women. However, some items from our study loaded differently on the 4 factors from Radloff’s model. Finally, the CES-D can be used as a general-factor scale without being compromised.
HIV/AIDS is one of the most significant health problems worldwide, with serious impact on mortality, morbidity, the use of health care services, and the overall quality of life among those infected and the families and communities surrounding them.1 Affecting over 33 million people worldwide, including at least 600,000 Thai adults (ages 15-49), the epidemiology of HIV is changing globally.2 Overall, women of childbearing age are the fastest-growing group of individuals to be infected with HIV. In Thailand, over 21,000 pregnant women are infected.2 Generally, HIV infections cause greater problems for people in developing countries such as Thailand than for those in developed countries, partly because of a lack of antiretroviral medications. Studies have shown that depressive symptoms are associated with greater non-adherence to antiretroviral treatment,3 faster disease progression,4, 5 and poorer quality of life.6
Maternal depression is a significant cause of morbidity among child bearing women in resource-poor countries.7, 8 Thus, early detection of depression in the perinatal period is important. However, few studies of depressive symptoms in HIV-positive individuals have focused on pregnant and postpartum women, although the perinatal period is a time in which women are particularly vulnerable to depressive symptoms, partly due to hormonal changes.9, 10 Previous depressive symptoms,11 perceived stress, social isolation, disengagement coping,12 and drug use,13 have been found to be psychosocial and behavioral predictors of perinatal depressive symptoms as well. Studies that have examined depressive symptoms using the Center for Epidemiologic Studies for Depression scale (CES-D)14 in HIV positive pregnant women in Thailand have demonstrated excellent internal consistency of the tool.15, 16, 17, 18 The construct validity of the tool in this target population , however, has not yet been examined in detail.
Most research that examined the construct validity of the CES-D using confirmatory factor analysis (CFA) showed that—regardless of types of populations (age, gender, culture/ethnicity, community/clinical setting, types of illness)—the classic 4-factor structure proposed by Radloff’s held true.19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33 Van Lieshout, Cleverley, Jenkins, and Georgiades compared postpartum immigrant and non-immigrant women using confirmatory factor analysis to determine that both groups conceptualize depressive symptoms in similar ways.31 Canaday, Stommel, and Holzman found an almost identical factor model across white and African American pregnant women.27 Conversely, studies that examined the construct validity of the CES-D using exploratory factor analysis (EFA) showed various results, ranging from 1-factor to 4-factor structure. Kim suggests that potential reasons for differing CES-D responses include “cultural or racial/ethnic differences in conceptualization, meaning, and symptom expression of depression” (p.382).23 There are trends in scoring which have been seen in Asian cultures that relate to the cultural inhibition in the expression of positive emotion, which might be seen as immodest or boastful.26 Even with these trends Zhang et al. confirmed Radloff’s four factor model structure of the CES-D when comparing depressive symptoms between the elderly Chinese and Dutch.33 EFA and CFA can supply two different kinds of information. In EFA one can see the cross loading magnitudes which can contribute to model specification while CFA allows one to test and compare models related to a specific hypothesis and theory.30
To our knowledge, only two studies in Thailand have examined the construct validity of the CES-D: one in college students34 and the other in community-dwelling elders.25 The study among college students using EFA supported the 4-factor structure of the CES-D34 although some items loaded on different factors as proposed by Radloff. The study among elderly Thais using CFA, however, revealed that the CES-D could be used either as a general-factor or as a 4-factor structure.25 The objective of this study was thus to examine the construct validity of the CES-D Thai version among HIV-infected perinatal women in Thailand using both EFA and CFA.
We collected two data sets—one of pregnant (n=127) and the other of postpartum (n= 85) HIV-infected women—between 2004 and 2007 in eastern Thailand. The original correlational studies for which the data sets were collected examined factors predicting depressive symptoms among perinatal Thai women15, 16 Five internal review boards in Thailand and the USA approved the correlational study protocols. Eligible participants were Thai women who were at least 18 years old, able to read and write in Thai, and diagnosed with an HIV infection. Demographic characteristics of the participants in both data sets were found to be non-statistically different in terms of age, education, income, and marital status using Chi-square or independent t-test. In total, ninety percent were married or living with a partner, sixty one percent had sufficient family income, fifty four percent were employed, and forty five percent had schooling through junior high.14 Data in both groups were also collected at the same four hospitals. Thus, we deemed it logical to combine the two data sets for the present study because the women from both groups were similar in terms of their socioeconomic status and geographical location. Participants filled in questionnaires in a private hospital room after an informed consent was signed. Data collectors checked the completeness of all of the respondents’ questionnaires on the spot. If some parts were found incomplete, they asked the respondents to complete the parts they inadvertently left out. This practice helped our data collection to be nearly perfect with an overall missing values for all measures at <.001%. No identifiable information is used in our results.
The original studies used the 20-item CES-D14 in a Thai version (with back translation) to measure depressive symptoms. CES-D asks respondents about their Depressed Affect (7 items), Interpersonal Relationships (2 items), Positive Affect (4 items), and Somatic Complaints (7 items) in the past week with response falling on a 4-point Likert scale ranging from “rarely or none of the time” (0) to “most or all of the time” (3). Possible total scores range from 0 to 60 with higher scores indicating more depressive symptoms. The CES-D is considered a screening tool and not a diagnostic tool for depression.14 The Cronbach's alpha for this combined data set was .90—the first clue that the CES-D had good validity in our study.35
The measures of self-esteem and emotional support (Thai version with back translation) used in the original studies are also described below. Correlations between the CES-D and these related conceptual measures were performed and will be presented in the results and discussion. The 10-item Rosenberg Self-Esteem scale 36 was used to measure self-esteem with a 4-point Likert scale ranging from strongly disagree (1) to strongly agree (4). Possible scores range from 10 to 40 with higher scores indicating higher self-esteem. The Cronbach's alpha for this combined data set was .78. Emotional support was measured by the Multidimensional Scale of Perceived Social Support (MSPSS)37, 38 with a 7-point Likert scale ranging from strongly disagree (1) to strongly agree (7). Possible scores range from 12 to 84 with higher scores indicating more emotional support. The Cronbach's alpha for this combined data set was .87.
Both EFA and CFA were performed to examine the construct validity of the CES-D. EFA was performed first because the factor structure of the scale has not been studied among HIV-positive perinatal women before, while CFA was performed by AMOS version 21 to test EFA results by verifying model fit. In EFA, the principal component analysis (PCA)—the most widely used data reduction technique—was used to extract factors using SPSS version 20. Varimax rotation followed to maximize the difference between low and high factor loadings for clear interpretations among factors.39
In CFA, the maximum likelihood estimation is used. A hypothesized graphical 4-factor structure of the CES-D based on the EFA results was drawn and run using AMOS Graphics. To examine if the model fits the data well, factor loadings, correlations among factors, standardized residuals, and several model fit indices were scrutinized.
Finally, we further examined the construct validity of the CES-D in relation to 2 related constructs: emotional support and self-esteem. Pearson’s Product Moment Correlation was performed to examine the relationship between the CES-D scores and the MSPSS scores and between the CES-D scores and the RS-E scores.
In factor analysis, the magnitude of sample size considered factorable is controversial. While Sapnas and Zeller found that a sample size as small as 25-50 subjects was adequate in their study, others recommended a larger sample size of 100 through over 1,00035, 39, 40, 41, 42, 43, 44 Some scholars recommend that the sample to variable/indicator ratio be used to ensure adequate sample size with the suggested ratio ranging from 3:1 to 20:135, 42, 43, 44, 45, 46 However, there is evidence that the sample size is not the sole indicator of factorability. With high correlations among indicators, a small sample size is adequate for factor analysis.43, 47, 48 In factor analysis, correlations among indicators of at least .30 should be present.35, 39 If no correlation is greater than .30, factor analysis should not be performed.35, 45
Our data set contains 212 cases with no missing values on the CES-D. Our sample to variable ratio is slightly over 10:1 (212 cases: 20 variables/indicators). Using Pearson’s correlations, almost 60% (110/190 = 58.9%) of the correlation values were at least .30 with .65 as the highest value. These results indicated that our data were likely to be factorable.
We performed an EFA without specifying the number of factors, using PCA and varimax rotation. A Kaiser-Meyer-Olkin Measure of Sampling Adequacy (KMO)’s value of .91 was generated, indicating excellent factorability. It has been advised that a KMO value should be greater than .60 and a value of >.90 is most preferable.35, 39 Although it is suggested that the sensitive-to-sample-size Bartlett's Test of Sphericity be tested only when the sample to indicator ratio is < 5:1, we wanted to check its value in our sample and found that the test was significant (Chi-square = 1,675.7, df =190, p <.001), thus favoring factorability.35, 39 Because there is no universal consensus on the best criterion to determine the number of factors in EFA, we applied 5 criteria to guide our decision-making: eigenvalues, Scree test, percentage of explained variance for each factor, cumulative percentage of explained variance, and rotated factor loadings. Table 1 shows that 4 factors are recommended, using the eigenvalue cutoff of > 1,49 while the Scree test/plot50 suggests 3 factors (an analogy is that, when you flex your elbow, the number of factors is shown starting at the elbow plus those along the forearm; see highlighted line in (Figure 1). As for the percentage of explained variance, 4 factors should be retained, as it is recommended that a factor with at least 5% of such a variance be kept.35, 39 With a recommendation for social and health science research,35, 44, 51 a cumulative percentage of explained variance of at least 50% is adequate in EFA. Thus, a 4-factor structure was determined.Table 1. Eigenvalues, percentage of explained variance, and cumulative percentage of explained variance
|Total Variance Explained|
|Component||Initial Eigenvalues||Extraction Sums of Squared Loadings||Rotation Sums of Squared Loadings|
|Total||% of Variance||Cumulative %||Total||% of Variance||Cumulative %||Total||% of Variance||Cumulative %|
|Extraction Method: Principal Component Analysis.|
Finally, we examined our rotated factor loadings generated from varimax rotation with suppression of any loading <.40 for a clear presentation. Table 2 shows 4 factors along with their respective items. However, 3 items (Depressed, Good, & Failure) had cross loadings and did not load cleanly because the difference between the loadings on the 2 factors is <.20.35, 39 We placed Depressed in Factor 1 and Good in Factor 2 because they make sense theoretically. We placed Failure in Factor 1 (Depressed Affect) rather than Factor 2 (Positive Affect) because Failure loaded substantially on both factors, and the loading difference is trivial (.512-.447 = .065). Importantly, Failure is a negative concept, so it theoretically belongs better to Depressed Affect than Positive Affect. See Table 3 for a comparison of four factor item loading based on Radloff’s, Vorapongsathorn et al., and the current study’s findings. At this point, we decided that a 4-factor structure is the best model for our data. CFA was performed next.Table 2. Rotated component matrix using Varimax rotation (suppressed factor loadings £ .40)
|CESD18: I felt sad. (Sad)||.801|
|CESD14: I felt lonely. (Lonely)||.725|
|CESD19: I felt that people dislike me. (Dislike)||.723|
|CESD17: I had crying spells. (Cry)||.717|
|CESD10: I felt fearful. (Fearful)||.660|
|CESD15: People were unfriendly. (Unfriendly)||.629|
|CESD20: I could not get “going” (Get going)||.571|
|CESD6: I felt depressed. (Depressed)||.568||.473|
|CESD13: I talked less than usual. (Talk)||.543|
|CESD11: My sleep was restless. (Sleep)||.437|
|CESD8: I felt hopeful about the future. (Hopeful)||.713|
|CESD16: I enjoyed life. (Enjoy)||.684|
|CESD12: I was happy. (Happy)||.663|
|CESD4: I felt that I was just as good as other people. (Good)||.598||.404|
|CESD9: I thought my life had been a failure. (Failure)||.447||.512|
|CESD7: I felt that everything I did was an effort. (Effort)||.635|
|CESD2: I did not feel like eating: my appetite was poor. (Appetite)||.616|
|CESD5: I had trouble keeping my mind on what I was doing. (Mind)||.611|
|CESD1: I was bothered by things that usually don’t bother me. (Bothered)||.757|
|CESD3: I felt that I could not shake off the blues even with help from my family or friends. (Blues)||.577|
|Radloff’s (1977) Factors||Radloff’s (1997) items: Community dwellers||Vorapongsathorn et al.’ items (1990): Thai college students||Our items: HIV-positive perinatal Thai women|
|Depressed Affect||Blues, Depressed, Failure, Fearful, Lonely, Cry, Sad(7)||Get Going, Dislike, Depressed, Failure, Fearful, Lonely, Cry, Sad, Unfriendly, Mind, Effort, Blues, (12)||Get Going, Dislike, Depressed, Failure, Fearful, Lonely, Cry, Sad, Unfriendly, Sleep, Talk (11)|
|Positive Affect||Good, Hopeful, Happy, Enjoy (4)||Good, Hopeful, Happy, Enjoy (4)||Good, Hopeful, Happy, Enjoy, (4)|
|Somatic Complaints||Bothered, Appetite, Mind, Effort, Sleep, Talk, Get going (7)||Bothered, Appetite, Sleep (3)||Appetite, Mind, Effort, (3)|
|Interpersonal Relationship||Unfriendly, Dislike (2)||Talk (1)||Bothered, Blues (2)|
With CFA, we used EFA results to create a 4-factor structure of the CES-D using AMOS Graphics and hypothesized that: 1) the CES-D is a 4-factor model; 2) correlations among factors substantially exist; and 3) all factor loadings onto their respective factors are substantially present. Results showed that all of the relationships among factors and those between indicators and factors (factor loadings) were substantially significant (Figure 2). No multicollinearity was found as none of the correlation was higher than .9043, 52 Standardized factor loadings ranged substantially from .46 to .82. Correlations among factors were moderate to strong and appropriate, ranging from .44 to .82 (Figure 2). The average explained variance for each factor was calculated by summing the squared factor loadings of the factor’s respective indicators divided by the number of the indicators comprising the factor.35 Results of factor variances ranged from 26.50% to 46.03% (Figure 2). Ideally, the value of at least .50 is desirable.35 Nevertheless, our explained factor variances were significantly higher than those of the previous Thai elderly whose variances ranged from 10.1%- 43.8%22
Next, we examined the standardized residuals (results not shown) which function similar to Z scores with fitted residuals divided by their respective standard error.35 The value usually starts at 0, which indicates a perfect fit of the model.40 A standardized residual of over 2.58 indicates a model misfit for particular variances/covariances.40, 53 There is no percent cutoff of misfit residual values.43 However, we found that only 3 out of the 190 residuals (1.6%) in our study had a value of over 2.58. These values were 2.65, 2.71, and 2.89 and not too far off from 2.58. Thus, the residual values indicate a good fit of the model.
Subsequently, we examined model fit indices using Chi-Square, Normed Fit Index (NFI), Incremental Fit Index (IFI), Comparative Fit Index (CFI), Root Mean Squared Error of Approximation (RMSEA) along with its confidence interval, and PCLOSE. These results are shown in Figure 2. Given that the Chi-square value is extremely sensitive to the sample size, we did not use Chi-square to judge model fit. Using a former recommended value of >.90 as a good fit cutoff,40 NFI, IFI and CFI values indicated poor fit. The RMSEA (<.80) and its CI short range are promising, indicating that the sample errors’ prediction is precise. However, the closeness of fit (PCLOSE) was significant (p <.05), indicating that there was too large of a sampling error,40 thus demonstrating poor fit.
To search for clues to improve model fit, we investigated modification indices (MI’s) and found that the largest value was a covariate of the error terms for Cry and Sad, indicating that these two indicators share something in common and may measure a similar trait.40, 43 This makes sense because Cry and Sad seem very close theoretically. The MI was 32.02 with an estimated parameter change (Par Change) of .147. A model restructuring was thus warranted because of this theoretical and empirical evidence.40 Therefore, we added a covariate between the error terms of Cry and Sad and reran the analysis. Factor covariate and factor loading results were significant and substantial (results not shown). Model fit results showed that all other values were acceptable except for PCLOSE with its value of .036, suggesting that we should look into MIs for possible respecification of the model. The largest MI (16.35 with Par Change of .145) was found between the error terms for Failure and Fearful, so we correlated them and reran the analysis. This time, all model fit indices and PCLOSE taken together ensured model fit (Figure 3).
The Chi square difference test, comparing the present model with the hypothesized model, also showed a significant result. This test is done by subtracting the revised model’s Chi square from the hypothesized model’s Chi square (331.63-280.19 = 51.44 = the Chi square difference value/Chi square change). The degree of freedom is performed similarly to the Chi square values, and in our study yielded a value of 2 (164-162). Using the Chi square table, the Chi square difference’s p-value was <.001, thus indicating that the present revised model statistically fits the data better than the hypothesized model.
Also, results from this respecified model revealed that the factor loadings remained significant and substantial (Figure 3). The average explained variances for the factors ranged from 26.50% to 45.31% (Figure 3). Based on our visual inspection of the standardized residual matrix, we found only 3 values over 2.58 (2.99, 3.01 & 3.06), which are smaller than the cutoff of 4.0, assuring good fit. There is a significant improvement of model fit indices when compared to the hypothesized model (Figure 3). Even though NFI is smaller than .90, it is suggested that CFI be used over NFI as CFI is adjusted for by taking the sample size into account.54If the most recent recommendation of >.95 is used,55 then IFI and CFI values indicate somewhat model fit. However, we have reasons to believe that our data fit the model well. First, our results are substantial and meaningful as evidenced by moderate-to-strong magnitudes of factor loadings and also moderate-to-strong correlations among factors. Second, our CFA was performed using a small sample size, so it is reasonable to use the cutoff of .90 instead of .95 to justify model fit.36, 48 CFA usually requires a larger sample size than EFA, which is true in our case. The sample to variable ratio is slightly over 10:1 for EFA but appears to be much smaller for CFA in our study. Based on Figure 3, the number of parameters to be estimated is 69: 16 regression coefficients, 9 covariances, 24 factor and indicator variances, and 20 error term variances. Thus, the sample to indicator ratio is 212/69 = 3:1, which is small.
Because the previous study among elders in Thailand reported that a general factor of the CES-D also fit as well as the 4-factor structure using CFA,25 we next tested a general factor of the CES-D in our HIV population using second-order CFA, based on the revised model. The second-order general factor was scaled to one so that all paths from it to each first-order factor could be calculated.43 The general factor is measured indirectly through the 4 first-order factors. Figure 4 shows that all indicators loaded substantially onto its respective first-order factor. The average explained variances for the first-order factors ranged from 26.20 % to 45.30% (Figure 4). The standardized residual matrix showed 3 values over 2.58 (3.00, 3.01, and 3.06) not too far off from 2.58, and much smaller than the 4.0 cutoff, suggesting good fit. The model fit indices revealed that the second-order CFA model fit the data well but slightly less well than the previous model (Figure 4). The Chi-square difference test result indicates that such difference is trivial and statistically not significant (Chi square change = 5.45, df =2, p >.05). When this general-factor, second-order CFA model is compared to the hypothesized model, we found that it fit the data better than the hypothesized model (Chi square change = 60.59, df =1, p <.001).
To cross-check whether or not the 4-factor structure as proposed by Radloff holds true, we constructed a visual diagram of the CES-D based on Radloff’s recommendation and ran an analysis. Results in Figure 5 revealed that 2 of the average explained variances were <50%, while the other two were >50%. Model fit indices indicate somewhat model fit (Figure 5). However, multicollinearity exists between 2 pairs of factors with a Person’s correlation of >.9043: Depressed Affect and Somatic Complaints; and Somatic Complaints and Interpersonal Relationships. The high correlation values indicate that these 3 factors measure relatively the same concept and should be restructured. Therefore, it is evident that the 4-factor CES-D model based on Radloff’s recommendation does not fit our data well.56
In sum, both our EFA and CFA results supported Radloff’s in that the CES-D comprises 4 factors. These results are consistent with some previous studies using EFA and most studies using CFA in other cultures such as African-American, American Indian, Anglo American, Australian, Mexican American, Canadian, Dutch, Chinese, Indonesian, Myanmar, North Korean, Sri Lankan, Taiwanese, and Thai.19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 57 However, several of the indicators in our study loaded onto factors differently from Radloff’s study. This shows that, in general, HIV-positive perinatal women in Thailand manifest depressive symptoms the same way as other populations around the world. However, subscales in the CES-D captured different depressive symptoms in our population.
Also, our study showed that a general factor of the CES-D fit the data as well as our 4-factor model. This result is consistent with the previous study among elders in Thailand.25 Therefore, the CES-D could be used as a general factor or as a 4-factor scale among HIV-positive perinatal women in Thailand.
To further investigate the construct validity of the CES-D, we ran Pearson’s correlations between the CES-D and MSPSS (emotional support) and the CES-D and RS-E (self-esteem). Results showed that both relationships were statistically negative with Pearson’s r at -.248 and -.519, respectively. These results indicate divergent validity between the CES-D and emotional support because its absolute r is less than .50,35while there is convergent validity between the CES-D and self-esteem since the absolute r value is greater than .50.35 These results further suggest that the CES-D has good construct validity as it is appropriately inversely correlated with the positive concepts of emotional support and self-esteem.58
In general, it seems that the CES-D’s overall construct validity is relatively stable across cultures and subjects. Results from our study show that the CES-D is a valid measure with good construct validity which can be used either as a general factor or as a 4-factor instrument among HIV-infected perinatal women in Thailand. Some items in the CES-D from our study loaded differently from Radloff’s study. Therefore, when subscales are used in the target population, different loadings of items between Radloff’s and our study should be observed.
There is support in the literature for combining both EFA and CFA.30 Our study found such a combination to be effective. Based on our study, it is logical and helpful to perform EFA first, followed by CFA. Even though the CES-D is a well-established instrument and is used around the world, no study had examined the construct validity of the CES-D among HIV-positive perinatal women in Thailand before ours. People in different cultures with different health conditions may experience different clusters of depressive symptoms. EFA helped us explore the structure of the CES-D and extract appropriate factors along with their associated indicators in our target population. Because tests for model fit in EFA are not available, CFA based on EFA results was used to verify model fit using standardized residuals and model fit indices. We recommend that future studies use combined EFA and CFA methods when a new tool is examined or when a new population is studied using a well-established tool.
There is a limitation to our study in that our participants were recruited from only one region in Thailand. Therefore, generalizability might be limited to only the eastern Thai region. Future studies should examine the construct validity of the CES-D among HIV-positive perinatal women in other regions of Thailand and in other countries beyond Thailand.