Academic Editor:Ian James Martins, Edith Cowan University
Checked for plagiarism: Yes
Review by: Single-blind
Development of a Chronic Obstructive Pulmonary Disease Severity Classification System Using A Japanese Health Insurance Claims Database
Healthcare services provided to patients should vary depending on disease severity. However, disease severity bias, a type of selection bias, is a commonly encountered problem in administrative database studies. Herein, we selected chronic obstructive pulmonary disease (COPD), which commonly affects elderly Japanese citizens, for the development and validation of a severity classification system based on a health insurance claims database.
Patients who received COPD-related diagnostic codes in 2011 were selected from a commercially based health insurance claims database. COPD patients were randomly divided into two groups to develop and validate severity scores. A principal component analysis was used to estimate factor loadings used to weight calculations of COPD severity scores. Score validity was evaluated using a linear trend test to predict COPD treatment costs and acute exacerbation events.
Using records from 880 patients, ten variables were created: acute exacerbation events, emphysema diagnoses, laboratory test and oxygen therapy procedures, prescribed anticholinergic, inhaled corticosteroid (ICS), short acting beta-agonist, and long acting bronchodilator (LABA) agents, asthma diagnosis and patient birth years. Factor loadings from LABA and ICS prescriptions had the strongest impacts on estimated severity scores (0.50 and 0.49, respectively). Among 300 validation group patients, scores were found to associate with increasing trends of median costs and exacerbation risks (p for trend < 0.05).
Chronic obstructive pulmonary disease (COPD) is a progressive disease characterized by chronic dyspnea, cough, sputum production, and mainly attributed to long-term exposure to tobacco smoke. COPD is the tenth-most common cause of death in Japan, and the number of associated deaths has exhibited an increasing trend 1. The previous study estimated that 5.3 million individuals aged ≥40 years were at risk of COPD in 2001 (estimated prevalence rate: 8.6%) 2. In addition, statistical surveys reported that patients with COPD accounted for expenditures totaling 151 billion yen (approximately 0.4% of the Japanese total medical expenditures) in 2011 3.
Health insurance claims databases, which reflect real-world clinical environments, are important research tools with respect to drug safety monitoring, epidemiology and health economic studies. These databases include information about provided medical services, including disease diagnoses, procedures, and prescribed medications. Under the universal health insurance coverage system in Japan, patients can evenly use all available services, thus allowing the database collection of comprehensive information for patients living in Japan 4. However, important clinical information is not included in these databases such as results of clinical test and disease severity. Appropriate treatment is provided according to a patient’s medical needs, which are determined by the disease condition and/or severity 5. In the absence of such information, estimated treatment effects determined through database studies are often biased due to confounding by indication 6. Thus, when using health insurance claims, summary variables indicative of disease conditions or severity must be created using diagnostic code and/or prescribed medications. For example, the Charlson comorbidity index was developed to predict mortality 7, and the Elixhauser comorbidity measure was developed to predict health-related outcomes 8. COPD severity is generally assessed according to the results of respiratory function tests, using Global Initiative for Chronic Obstructive Lung Disease (GOLD) criteria 9.
In the present study, we focused on COPD-related disease scores as a method of evaluating disease-specific costs and health service utilization. Various administrative database-focused COPD severity scores have been reported 10, 11, 12, from among these, we selected a scoring system developed by Wu and colleagues that corresponded to our research purposes 10. This score, which was developed in the United States (US) for patients with COPD who experience acute exacerbation events, was used to estimate drug utilization and medical costs related to COPD severity without relying on respiratory function test data 13. However, the patients evaluated in that study might have had a more severe disease condition, compared with the general population of Japanese patients with COPD. Moreover, drug selections for COPD management differ between the US and Japanese clinical environments: for example, the transdermal tulobuterol patch, a long-acting β2-agonist (LABA), has been frequently used in Japan for the long-term management of stable patients with COPD 14. Therefore, we decided to modify the severity scoring system developed by Wu and colleagues to the Japanese clinical environment and validate this modification using a Japanese administrative database.
This study used a health insurance claims database maintained by the Japan Medical Data Center (JMDC, Tokyo, Japan), which contains inpatient, outpatient, and pharmacy prescription records collected for approximately 2 million individuals since January 2005 4. A majority of insured persons listed in the database are employees of large companies or their family members 15. This database is a useful resource for evaluating healthcare resource usage, as it includes ICD-10 (International Classification of Diseases, Tenth Revision) diagnostic codes, ATC (Anatomical Therapeutic Chemical Classification System) pharmaceutical product codes, and codes for laboratory tests and other medical procedures. All information is recorded on a monthly basis for insurance claim processing.
We extracted data for patients with COPD and at least one COPD-related diagnosis code (ICD-10 code J42: chronic bronchitis, J43: emphysema, or J44: chronic obstructive pulmonary disease) during a defined period from January to December 2011. Patients were subsequently excluded if they had no continuous follow-up data for at least 12 months, only one COPD diagnosis during follow-up, an age <40 years at diagnosis, no evidence of COPD confirmation (e.g., respiratory function test, x-ray photography, or computerized tomography (CT) imaging), or a cancer diagnosis with any surgical procedures. We randomly divided eligible patients into two groups: a developing group and validating group. Analyses were performed using SAS software, version 9.3 (SAS Institute. Inc., Cary, NC, USA), and were initiated after obtaining approval from the institutional review board at Meiji Pharmaceutical University.
The first claim involving a COPD diagnosis (ICD-10 codes: J42–J44) in 2011 was identified, and the month was defined as an index. Next, information about COPD related services provided to patients during the 12-month period after the index month were extracted. We created nine variables: acute exacerbation event, emphysema diagnosis, asthma diagnosis with the medication, laboratory testing (respiratory function tests, x-ray photography, or CT imaging), oxygen therapy, or prescribed anticholinergic, inhaled corticosteroid (ICS), short acting beta-agonist (SABA), and LABA agents 10. As no direct code indicating an acute exacerbation event in the insurance claims data, we created a proxy variable using information of an out-patient visit with a macrolide antibiotic/oral corticosteroid prescription or an emergency department visit with respiratory related diagnosis as used in previous studies 11. We counted the number of variables that appeared in the monthly claims during the 12-month follow-up period (each variable range: 0–12). In addition, a continuous variable indicating patient age at COPD diagnosis was created using the patients’ birth-year data.
A principal component analysis (PCA) was conducted to calculate factor loadings as weights 16. The PCA, a multivariate technique, is often used to reduce the number of variables. Each value of the relevant variables (age plus the nine variables indicated above) was multiplied by its own factor loading of the first principal component; these values were then summed to yield a severity score. For example, if the LABA factor loading was 0.6 and six LABA prescriptions were given in a year, the LABA score would be calculated as 0.6 × 6 = 3.6 points. Other points were similarly calculated and summed to yield each patient’s original severity score. These scores were then standardized (mean: 50, standard deviation; 10) according to the different units of the variables. The reliability of the estimated scores was assessed using Cronbach's alpha 17. In addition, we divided the severity scores into quartiles and calculated the mean value in each quartile to confirm increasing trends for each variable.
COPD severity score validity was confirmed by estimating the annual costs of COPD treatment and probability of an acute exacerbation event. We assumed that the calculated severity scores would be able to predict increased trends in these values. Many studies indicated that COPD costs and exacerbation risks were positively associated with the severity of a patient’s condition 18, 19, 20. We re-calculated severity scores in the validating group, using the 12-month values of each variable multiplied by factor loadings (as weights) estimated from the developing group. The scores were then divided into three severity groups according to epidemiological estimates of the COPD severity distribution in Japan (mild: 56%, moderate: 38%, severe/very severe: 6%) 2. A linear trend test was conducted to confirm increased trends in COPD treatment costs and acute exacerbation risks 21, 22.
We identified 1,784 patients with COPD diagnostic codes in 2011 (Figure 1). The following patients were excluded: 186 who did not receive another COPD diagnostic code during the 12-month follow-up, 224 without adequate continuous follow-up data, 23 who were younger than 40 years, 156 with no evidence of COPD-related laboratory tests or prescriptions, and 15 who underwent surgery related to cancer diagnosis. The remaining 1,180 patients with COPD were included in the study and assigned randomly to a developing group (n = 880) or validating group (n = 300). Demographic characteristics of patients in each group are described and compared in Table 1. The distributions of age categories, sex, and comorbid conditions were similar between the groups. Patients in both groups received an average of approximately eight COPD claims.Table 1. Demographic characteristics of the developing and validating groups
|Variables||Development group||Validation group|
|Total COPD patients||880||300|
|Total COPD claims||7,066||2,379|
|Claims per patient||8||8|
|Age categories (at first COPD diagnosis) Mean age (SD)||56||(9)||55||(9)|
|70 years or older||83||(9)||26||(9)|
|Comorid conditions (ICD-10) *|
|Ischemic Heart Disease (120-25)||96||(11)||27||(9)|
PCA results are summarized in Table 2. Among the ten variables, the factor loadings of LABA and ICS were most reflective of the severity of COPD conditions (0.50 and 0.49, respectively). Cronbach’s alpha value was 0.60. Patients were categorized into quartiles according to scores (Q1 and Q4 were the lowest and highest scores, respectively). The mean value of the counted variables was calculated for each group (Table 3), and the results indicated increased trends in higher severity score quartiles.Table 2. Factor loading estimated from a principal component analysis.
|Variables||Factor loading||Cronbach's alpha removing each variable|
|3Laboratory tests (number of claims)||0.31||0.57|
|4Anti-cholinergics (number of claims)||0.30||0.56|
|5 SABA* (number of claims)||0.21||0.60|
|6LABAa (number of claims)||0.50||0.53|
|7Inheled corticosteroids (number of claims)||0.49||0.53|
|8Oxygen therapy (number of claims)||0.23||0.59|
|9AECB* (number of episodes)||0.28||0.58|
|10Asthma (number of claims)||0.27||0.59|
|Total Cronbach's alpha||0.60|
|Laboratory tests (number of claims)||0.82||1.44||2.66||3.17|
|Anti-cholinergics (number of claims)||0.07||0.49||1.75||3.40|
|SABA(number of claims)||0.25||0.44||0.87||1.70|
|LABAa(number of claims)||0.50||1.00||2.04||6.78|
|Inheled corticosteroids (number of claims)||0.33||1.11||2.41||6.61|
|Oxygen therapy (number of claims)||0.01||0.15||0.26||0.84|
|Acute exacerbation (number of episodes)||0.70||1.19||2.05||3.63|
|Asthma (number of claims)||0.31||0.60||1.18||2.88|
Severity scores were re-calculated in the validating group (n = 300), using factor loadings from the development step. Distribution scores ranged from 4 to 34 and were similar between the developing and validating groups, as shown in Figure 2. When severity scores were divided into three categories, mild, moderate, and severe/very severe, the median costs were 79,027 yen, 204,445 yen, and 422,463 yen, respectively, indicating an increasing trend (p for trend < 0.05, Figure 3). In addition, a similar increasing trend was observed for the risk of an acute exacerbation event (48%, 61%, and 83%, respectively; Figure 4).
This study developed a COPD severity classification method using a Japanese administrative database and validated the performance of this method. Score validity was confirmed by estimating COPD treatment costs and acute exacerbation risks, with higher scores indicating worse COPD conditions. Accordingly, this severity classification system could be used as a risk adjustment factor to control for potential confounders in administrative database studies.
Few attempts to classify COPD conditions in an administrative database have been published. Notably, Macaulay and colleagues reported that they had classified COPD patients into three severity groups according to spirometry test results and GOLD criteria in a study based on an electronic health records database linked to a health care claims database 12. As their database included respiratory function test results, the authors were able to define COPD severity based on GOLD criteria. In addition, Mapel and colleagues developed a method for identifying and characterizing COPD 11. These authors stratified patients according to comorbid respiratory conditions and medical procedures but used coding systems unique to the US (such as ICD-9 and CPT-4 codes), with no counterparts in Japanese claims systems, to define COPD severity. Moreover, Eisner and colleagues created COPD severity scores that used patient survey data but did not require respiratory function tests 23. That scoring system, however, required health-related quality of life and physical disability-related information that are rarely included in administrative databases. As a result, a coding system that required neither the results of respiratory function tests nor patient-reported outcomes was required.
Wu and colleagues previously developed a classification method using a claims database in the US 10. Their research included 2,068 patients with an acute exacerbation of chronic bronchitis due to COPD. Twelve variables were selected to calculate COPD severity scores: number of days of hospitalization due to acute exacerbation; number of claims for oxygen therapy, acute exacerbation, emphysema, spirometry test, pulmonologist visit; prescriptions of anticholinergic, oral corticosteroid, ICS, SABA, and LABA agents; and patient age. The method developed by Wu and colleagues was later used to examine the utilization and cost of medical services according to COPD severity 13, and was validated using another administrative database, although no direct comparison of respiratory function test values was performed 24.
In our study, we added asthma variable and excluded three variables (hospitalization due to acute exacerbation, pulmonologist visit, and use of oral corticosteroid) from the method described by Wu and colleagues to increase score reliability for the following reasons. First, asthma is an important risk factor of COPD. Our study population had approximately 50% of asthma diagnosis. Second, our database included long-term hospitalized patients who required no aggressive treatments. Third, not all patients received COPD services from pulmonologists; some occasionally received services from doctors in other departments. In addition, when patients received COPD services from large hospitals, codes indicative of the doctors’ specialties were often missing. Last, the oral corticosteroid variable was used to define an acute exacerbation event.
In our study, prescriptions of anticholinergic, LABA, and ICS agents were strongly associated with higher severity scores. However, this trend was in contrast to the findings of Wu et al., who reported the strongest effects with prescriptions of anticholinergic, SABA, and LABA agents. This discrepancy might be attributable to differences in clinical guideline recommendations. The GOLD criteria regarding stable COPD management recommend the use of anticholinergics or SABA for mild conditions and LABA for moderate conditions 4. On the other hand, the Japanese guideline recommends initiating LABA for mild conditions and adding ICS for more severe conditions. The findings from our estimate scores reflect these differences in drug treatment options.
Our method yielded an insufficient Cronbach’s alpha compared to that obtained by Wu et al (0.60 vs. 0.71). Cronbach’s alpha assesses reliability among variables. A value of 1 indicates completely consistent variables, whereas a score of 0 indicates no correlation among variables. Values of 0.7–0.8 should be regarded as satisfactory17; however, our Cronbach’s alpha failed to reach this range. Infrequently observed variables such as oxygen therapy could potentially explain this issue; as such variables had small correlation coefficients, Cronbach’s alpha would be smaller. However, we did not remove these variables because they were very important indicators of COPD severity.
One advantage of this study was the ability to create severity scores without retrieving respiratory function test data from an administrative database. A multivariable analysis (e.g., logistic regression analysis) is usually used to create a score by setting a dependent variable such as the health or economic outcome, and accordingly calculates the weights of independent variables12. In the case of COPD severity, respiratory function test results and symptoms are needed to define the dependent variable. Unfortunately, these data are not available in Japanese administrative databases. However, using the PCA method, it was possible to calculate the weight (factor loading) of each independent variable without setting a specific dependent variable. This statistical technique is a way to overcome this weakness associated with administrative databases.
In addition, the data included in the JMDC database were collected through the insurance reimbursement process; therefore, information is rarely missing. Moreover, under the Japanese national health insurance program, all services provided to COPD patients should be almost fully covered. For these reasons, our classification method was developed using all records of COPD treatment provided to Japanese patients.
However, this study also included limitations common to administrative database studies 10, 12. Notably, we did not consider the risk factor of smoking history, because the variable was not available in our database. Even without considering the data, our method was capable of describing COPD severity with regard to age and other treatment procedures.
Approximately 50% of patients in our study had comorbid asthma. It is difficult to clinically distinguish COPD from asthma, and therefore these diagnoses often overlap (asthma-COPD overlap syndrome). In previous research showed the prevalence of asthma-COPD overlap syndrome was 1.8-56.0% 25, 26, 27, 28. Therefore, we did not remove patients with asthma from the study population.
PCA often faces problems related to the low reproducibility of factor loading as a score system basis. Reproducibility depends on treatment patterns in a database. Therefore, when using different data sources, researchers should re-calculate factor loading, as demonstrated in our study. Furthermore, additional studies in which our findings are applied for clinical usage are needed. We will compare the performance of COPD severity scores and clinical conditions using electronic health records at a large-scale hospital in Japan. The severity scores calculated from factor loadings in this study are relative values and cannot be used for distributions of COPD severities (i.e., proportions of mild vs. more severe conditions). Therefore, we will set the cut-off values according to the GOLD criteria. These criteria allow the classification of COPD conditions into four severity categories depending on the values of respiratory function tests. We will evaluate the scores using the c-statistic, positive predictive value, or negative predictive value according to the electronic health records database. These techniques have been used previously to assess model discrimination and validate severity classification methods 29, 30, 31.
In this study, a COPD severity classification method based on an administrative database in Japan was developed. This method is able to estimate COPD conditions without requiring laboratory test or clinical symptom data. For clinical implementation, we will confirm the validity of this classification system through comparison with medical information, including laboratory data. This classification method is a very important step in the adjustment of potential outcome risk factors according to administrative databases.
We would like to thank Mr. Kosuke Iwasaki of Milliman, Inc., who provided helpful comments and suggestions regarding the PCA method.