Plasma proteomics and lung function in four community-based cohorts

Background: Underlying mechanism leading to impaired lung function are incompletely understood. Objectives: To investigate whether protein profiling can provide novel insights into mechanisms leading to impaired lung function. Methods: We used four community-based studies (n = 2552) to investigate associations between 79 cardiovas-cular/inflammatory proteins and forced expiratory volume in 1 s percent predicted (FEV 1 %) assessed by spirometry. We divided the cohorts into discovery and replication samples and used risk factor-adjusted linear regression corrected for multiple comparison (false discovery rate of 5%). We performed Mendelian randomi- zation analyses using genetic and spirometry data from the UK Biobank (n = 421,986) to assess causality. Measurements and main results: In cross-sectional analysis, 22 proteins were associated with lower FEV 1 % in both the discovery and replication sample, regardless of stratification by smoking status. The combined proteomic data cumulatively explained 5% of the variation in FEV 1 %. In longitudinal analyses (n = 681), higher plasma levels of growth differentiation factor 15 (GDF-15) and interleukin 6 (IL-6) predicted a more rapid 5-year decline in lung function (change in FEV 1 % per standard deviation of protein level (cid:0) 1.4, (95% CI, (cid:0) 2.5 to (cid:0) 0.3) for GDF- 15, and -0.8, (95% CI, (cid:0) 1.5 to (cid:0) 0.2) for IL-6. Mendelian randomization analysis in UK-biobank provided support for a causal effect of increased GDF-15 levels and reduced FEV 1 %. Conclusions: Our combined approach identified GDF-15 as a potential causal factor in the development of impaired lung function in the general population. These findings encourage additional studies evaluating the role of GDF-15 as a causal factor for impaired lung function.


Introduction
Individuals with reduced lung function and chronic obstructive pulmonary disease (COPD) have increased risks of cardiovascular disease and mortality [1][2][3][4][5] that cannot entirely be attributed to established risk factors, like smoking, hypertension and diabetes [5][6][7]. These individuals represent a heterogenous group with differences in disease severity, rate of progression and impact on the quality of life [8]. As the current Global Initiative for Chronic Obstructive Lung Disease (GOLD) criteria for COPD are insufficient for predicting the long-term outcomes in these patients [9], new diagnostic tools are needed to identify high-risk individuals that would benefit from targeted prevention.
Recent proteomics methods have made it possible to simultaneously measure large numbers of proteins in small quantities of blood. Previous studies investigating the association between circulating proteomics and impaired lung function are scarce, have shown disparate results and have usually been based on small studies in patients with prevalent COPD [10][11][12][13], while community-based studies are lacking. Further studies are needed to find biomarkers that could aid clinicians to stratify and predict outcomes among individuals with impaired lung function [14]. Given that the developments of lung and cardiovascular disease share many underlying mechanisms, we hypothesized that proteins known to be involved in cardiovascular disease pathology may also be involved in impaired lung function.
Using several measurements on lung function in our analyses would increase the number of test and the risk for spurious associations (type 1 error). Forced expiratory volume in 1 s (FEV 1 ) has been a robust indicator of mortality in both smokers [2,7] and never smokers [6]. We therefore used only FEV 1 percent predicted (from here on denoted FEV 1 %) as the primary lung function outcome in the present study.
We aimed to investigate the cross-sectional associations between 79 proteins implicated in cardiovascular or inflammatory disease, and FEV 1 % in four independent cohorts study using a discovery and replication approach. We further aimed to assess whether the proteomic profile associated with impaired lung function differed depending on smoking status, and to identify proteins that was associated with a more rapid lung function decline over a 5-year time period. Lastly, we aimed to assess causality of proteins associated with lower lung function values. Since observational studies cannot provide unbiased evidence of causal relationships because of possible reverse causation and confounding, Mendelian randomization (MR) was used. MR uses genetic variants associated with an exposure as instrumental variables to assess causality on an outcome [15]. MR minimizes bias from confounding or reverse causation because genetic variants are randomly allocated at conception. Therefore, MR could be utilized to give information about causal relationships in observational data.

Materials and methods
A more detailed version of this section is available as online supplement.

Discovery stage
Two independent community-based cohorts with a similar study protocol, recruited in Uppsala, Sweden, were used in our discovery stage: The Prospective investigation of Obesity, Energy and Metabolism (POEM) study described in detail elsewhere [16], and the Prospective Study of the Vasculature in Uppsala Seniors (PIVUS), also described in detail elsewhere [17]. A total number of 1337 participants (479 POEM and 858 PIVUS) fulfilled the inclusion criteria as the discovery cohort ( Fig. 1). In PIVUS, 681 participants had adequate data on lung function and proteomics at the follow up examination at age 75 years (Fig. 1).

Replication stage
Two independent cohorts were selected from the Study of Atherosclerosis in Västmanland, a healthy control group (SaVa-controls) and patients with peripheral artery disease (Peripheral Arterial Disease in Västmanland, PADVa). A total number of 1215 participants (800 from SAVa-controls, 415 from PADVa) had complete data for proteomics, smoking data and lung function and comprised our replication stage (Fig. 1).

Spirometry, smoking, and exercise data
All spirometry were performed in accordance with the American Thoracic Society recommendations [18]. FEV 1 values are expressed as percent of predicted values (FEV 1 %), adjusted for age, sex and height according to the Global Lung Function Initiative formula [19]. In order to ensure high quality of the spirometry data, participants with extreme absolute values (FEV 1 >7 L, forced vital capacity (FVC) > 7 L) or obviously false ratio (FEV 1 /FVC>1) were excluded. Data on smoking history including pack years was assessed with questionnaires. Exercise levels were arbitrary divided into four groups: sedentary:<2 light exercises (no

Multiplex proteomics
The Proseek Multiplex CVD I 96x96 assay (Olink, Uppsala, Sweden) measuring 92 cardiovascular disease-related human proteins with the proximity extension assay method [20] was used in this study. After quality control in all cohorts, 79 out of the 92 proteins were available in all four study cohorts and included in this study.

Observational analyses
For the first analyses, a multivariable linear regression model was used, adjusting for age, sex, cohort, BMI, exercise level, smoking status (current/previous vs never smoker) and pack years, to assess cross sectional associations between the proteins (independent variables) and percent of predicted values of FEV 1 (FEV 1 %, dependent variable) in the discovery sample. All proteins associated with FEV 1 % at a false discovery rate <5% in the discovery stage were tested in the replication stage using nominal p-values as the significance cut-off (Fig. 2). In the secondary cross-sectional analyses, we investigated associations between significant (and replicated) proteins and FEV 1 % stratified by smoking status (never smokers vs current/previous smokers) using the same multivariate model from first stage (Fig. 2). In our third analyses, we investigated longitudinal associations between the significantly replicated proteins from the first analyses and FEV 1 decline during 5 years in the PIVUS cohort. The same multivariable model from the first and second analyses was used with the addition of FEV 1 % values at baseline (Fig. 2). Analyses were done in Stata 14.2 and R version 3.3.

Selection of genetic instruments
For proteins associated with FEV 1 % decline (IL-6 and GDF-15), we selected independent genetic variants in the coding region with Genome wide association study (GWAS)-significant associations with plasma levels. Three intergenic variants was used for GDF-15 [21,22] and IL-6 respectively [23]. Genetic associations were scaled to standard deviation unit of GDF-15 level. IL-6 associations are expressed in natural log scale. For additional details, please see the online supplementary method.

Genetic associations with outcomes
We used the GWAS results in the UK Biobank [24] published on February 20, 2019 (https://data.bris.ac.uk/data/dataset/pnoat8 cxo0u52p6ynfaekeigi). Mitchell et al. [24]. Carried out GWAS adjusted for sex and genotyping array using the BOLT-LMM method that accounts for relatedness and population stratification. The outcome was the best measured FEV 1 as absolute value (n = 421,986) measured using a Vitalograph Pneumotrac 6800 and standardized assessment procedures.

Mendelian randomization
We used the inverse variance weighted (IVW) method adjusting for genetic correlations [25]. Correlations between SNPs were estimated in the 1000 Genomes reference panel and analyses carried out using TwoSampleMR in R. The IVW method for correlated genetic variants combines the associations with the outcome into a weighted average scaled by the genetic association with the protein level.

Ethical permission
Participants provided written informed consent and the study was conducted according to the Declaration of Helsinki.

Results
The baseline characteristics of the discovery sample (PIVUS and POEM, n = 1337) and the replication sample (SAVa and PADVa, n = 1215) are described in Table 1. Participants in POEM were younger, less likely to smoke, had more active lifestyle and higher lung function compared to the other cohorts. SAVa and PADVa included a larger proportion of men and PADVa had the highest number of participants with a smoking history.

Discovery and replication of the cross-sectional association between proteins and FEV 1
In the discovery sample, 32 of the all the 79 proteins were inversely associated with FEV 1 % in multivariable linear regression adjusting for age, sex, cohort, BMI, exercise level, smoking status (current/previous vs never smoker) and pack years at FDR<5% (Supplementary Table 1). Of these 32 proteins, 22 proteins were also significantly associated with FEV 1 % in the replication sample (p < 0.05). For example, one standard deviation increases in plasma leptin (LEP) was associated with a 4% decrease in FEV 1 % (Table 2). Taken together, the 22 proteins combined explained 5% of the total variation of FEV 1 % in the total sample (adjusted R-squared 0.14 vs 0.19).

Stratification by smoking status
We merged the discovery and replication samples and performed analyses stratified by smoking status (never smokers, n = 1145; and previous/current smokers, n = 1407) and investigated the 22 proteins significantly replicated from the first stage. The results were similar in the two strata, with 20 out of 22 proteins significantly associated with FEV 1 % in never smokers (osteoprotegerin and vascular endothelial growth factor D was not) and 22 out of 22 proteins in previous/current smokers (Supplementary Table 2).

Longitudinal associations between the proteins and 5-year change in FEV 1 %
In PIVUS, a second spirometry examination was performed after 5 years in 681 participants. The mean decline in FEV 1 % during follow-up was 5.7 ± 7.3%. Of the 22 proteins identified in the cross-sectional analysis, IL-6 and GDF-15 levels at baseline were significantly associated with a steeper FEV 1 % decline during the 5 year follow up. One SD increase in IL-6 was associated with − 0.8% (95% CI -1.5, − 0.2) lower FEV 1 % and one SD increase in GDF-15 was associated with − 1.4% (95% CI -2.5, − 0.3) lower FEV 1 %, (p < 0.05 for both), (Supplementary Table 3).

Principal findings
Using four independent cohorts, we discovered associations between higher levels of 22 circulating cardiovascular proteins and lower lung function independent of conventional risk factors and regardless of smoking status. Higher levels of IL-6 and GDF-15 at baseline were associated with more rapid 5-year decline in FEV 1 %. Mendelian randomization analyses supported a causal relationship between higher plasma levels of GDF-15 and lower FEV 1 , while no causality was indicated for the effect of IL-6 on FEV 1 .
GDF-15 is a stress response cytokine and increased levels have previously been associated with several different diseases such as cardiovascular disease, diabetes and chronic kidney disease [26]. Moreover, increased GDF-15 levels are associated with inflammation and oxidative stress [26], two important underlying factors leading to COPD [10,11,27,28]. Verhamme et al. have previously shown that GDF-15 levels are increased in lung tissue among smokers and COPD-patients. Additionally, the GDF-15 levels were correlated to lower FEV 1 [29] supporting the results in the present study. GDF-15 levels have also been associated with increased cardiovascular risk among COPD patients free of overt cardiovascular disease [30]. Thus, GDF-15 seems to be a biomarker associated with both impaired lung function and cardiovascular disease. One plausible connection between the two could be endothelial dysfunction. GDF-15 could have a protective role for endothelial cells and are upregulated in vitro by shear stress [31]. We have previously reported an association between lower endothelial dependent vasodilation in forearm resistance arteries and lower FEV 1 % among lung healthy individuals [32], and Lind et al. have previously shown that lower endothelial dependent vasodilation in resistance arteries are associated with an increased risk for cardiovascular disease [33].
We are not the first to report an association between increased GDF-15-levels and lung function decline. Husebo et al. have previously shown that high (dichotomous variable) levels of GDF-15 at baseline among COPD-patients (GOLD 2-4) was associated with lung function decline during a 3-year follow up [34]. Increased circulating levels of GDF-15 has also been reported in early COPD [27] as well as in COPD patients with accelerated decline in FEV 1 [34].
GDF-15 has also recently been reviewed by Vermamme et al. in a pulmonary medicine perspective. One conclusion was that GDF-15 is involved in the progression of COPD [31]. Thus, our data provide additional support a causal negative effect of increased plasma levels of GDF-15 on lung function.
Existing proteomics studies on lung function mostly involve COPD and asthma patients and provide varying results [35,36]. To the best of our knowledge, we are the first to test associations between plasma proteomics and FEV 1 % assessed by spirometry in the community and to report proteomic data and FEV 1 % decline over five years among individuals without overt lung disease. Additional studies are warranted to evaluate whether GDF-15 modifying therapies could halt pulmonary inflammation and deterioration of lung function.
Increased levels of the inflammatory marker IL-6 among COPD patients have previously been reported in the literature [37,38]. Increased IL-6 levels have also been associated with a FEV 1 decline among COPD patients [39]. Our longitudinal results are supported by these previous studies and extend it to a non-COPD setting. However, our MR analysis did not support a causal role of IL-6 on FEV 1 decline. This is in accordance with a previous small MR study in 134 post-MI patients, that also did not support a causal link between circulating IL-6 and impaired lung function [40].
Given the small effect sizes on FEV 1 due to genetically predicted changes in GDF-15 (<0.01), our study may have missed existing causal effects of IL-6 due to limited power (80% power to detect MR effects larger than 0.017 FEV 1 units per SD-unit change in IL-6).

Clinical relevance
Although a large number of proteins were statistically significantly associated with impaired lung function, individual proteins were weakly associated with FEV 1 in our community-based sample. In fact, taken together, the combination of the 22 proteins explained only 5% of the total variation of FEV 1 %. One possible explanation for this is the fact that the proteins on the assay selected to be important for cardiovascular pathology and not for lung disease but it could also imply that plasma levels of proteins are not so important for lung function. Regardless, our findings encourage large-scale proteomics studies to fully elucidate clinical utility of proteomics assessment in order to identify patients at higher risk of a rapid progression of the disease.

Strengths and limitations
Strengths of our study include the large study samples, using a discovery and replication approach, and availability of one cohort with repeated spirometry measurements. We used a large separate sample in MR, but any causal inference in MR requires that specific assumptions are met, such as the absence of pleiotropy [15].
The main limitation or our study is that three out of the four cohorts lacked longitudinal data on lung function. Multiple cohorts with repeated measurements of spirometry data would have provided better possibilities to identify novel risk markers for rapid lung function decline. Also, no reversibility test was performed on spirometry. Even though our Mendelian randomization analyses suggest a potential casual effect of circulating GDF-15 on impaired lung function our observational data do not provide any insights of the underlying mechanisms for these associations. Additional experimental studies are warranted.
Another inherent problem with proteomic studies is large amount of data, making the statistical approach challenging [8]. In the present study, analyses were performed in the discovery and replication cohort using FDR 5% and nominal p-values respectively in order to take into account the multiple testing. This approach has previously been shown to keep the number false positive low, while at the same time not being overly conservative. Using a more conservative method such as Bonferroni in both the discovery and replication sample would increase type 2 error while having only a modest effect on type 1 errors [42].

Conclusions
This study found an association between 22 proteins and lower FEV 1 % with no substantial difference among never smokers vs previous/ current smokers. Only two protein at baseline was associated with steeper FEV 1 % decline over 5 years. Mendelian randomization provided evidence of a potentially causal association between circulating levels of GDF-15 and FEV 1 (absolute value). Our findings encourage continued research on the role of GDF-15 in lung disease pathophysiology.

Summary conflict of interest statements
AR has received lecturing fees from AstraZeneca. JÄ has received lecturing fees from AstraZeneca and Novartis and has served on advisory boards for AstraZeneca and Boehringer Ingelheim for subjects unrelated to the present manuscript. CJ has received payments for educational activities from AstraZeneca, Boehringer Ingelheim, Chiesi, Glax-oSmithKline, Novartis and Teva and has served on advisory boards arranged by AstraZeneca, Boehringer Ingelheim, Chiesi, GlaxoSmithKline, Novartis and Teva. B.S. has received honoraria for educational activities and lectures from AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, MEDA, Chiesi and TEVA and has served on advisory boards arranged by AstraZeneca, Novartis, GlaxoSmithKline, Boehringer Ingelheim and MEDA. Erik Ingelsson is currently an employee at Glax-oSmithKline. None of the funding sources had any influence on this study.