Computational STAT4 rSNP analysis, transcriptional factor binding sites and disease

Purpose - Signal Transducer and Activator of Transcription 4 (STAT4) is important for signaling by interleukins (IL - 12 and IL - 23) and type 1 interferons and has been found to have several simple nucleotide polymorphisms (SNPs) associated with human disease. STAT4 SNPs were computationally examined with respect to changes in potential transcriptional factor binding sites (TFBS) and these changes were discussed in relation to human disease. Methods - The JASPAR CORE and ConSite databases were instrumental in identifying the TFBS. The Vector NTI Advance 11.5 computer program was employed in locating all the TFBS in the STAT4 gene from 4 kb upstream of the transcriptional start site to 8.3 kb past the 3’ UTR. The JASPAR CORE database was also involved in computing each nucleotide occurrence (%) within the TFBS. Results - The STAT4 SNPs in the 70 kb intron between exon 2 and 3 are in linkage disequilibrium and have previously been found to be significantly associated with several vasculitis diseases as well as diabetes. The SNP alleles were found to alter the DNA landscape for potential transcriptional factors (TFs) to attach resulting in changes in TFBS and thereby, alter which transcriptional factors potentially regulate the STAT4 gene. These STAT4 SNPs should be considered as regulatory (r) SNPs. Conclusion - The alleles of each rSNP were found to generate unique TFBS resulting in potential changes in TF STAT4 regulation. These regulatory changes were discussed with respect to changes in human health that result in disease.


INTRODUCTION
The Janus Kinase-Signal Transducers and Activators of Transcription (JAK-STAT) pathways play a critical role in immune, neuronal, hematopoietic and hepatic systems [1]. JAK-STAT is a principal signal transduction pathway in cytokine and growth factor signaling as well as regulating various cellular processes such as cell proliferation, differentiation migration and survival [2]. JAK-STAT provides the principle intracellular signaling mechanism required for a wide array of cytokines [3,4]. The STAT portion of the signaling cascade has seven mammalian family members which are STAT1, 2, 3, 4, 5a, 5b and 6 [3,4]. These STATs bind thousands of transcriptional factor binding sites (TFBS) in the genome and regulate the transcription of many protein-coding genes, miRNAs and long noncoding RNAs [4]. The STAT 4 gene which is important for signaling by interleukins (IL-12 and IL-23) and type 1 interferons [4] has been found to have several simple nucleotide polymorphisms (SNPs) associated with human disease [5][6][7][8][9][10][11][12]. STAT4 transduces IL-12, IL-23 and type 1 interferon-mediated signals into helper T (Th) cells (Th1 and Th17) differentiation, monocyte activation, and interferon-gamma production [12,13].
The rs7574865 STAT4 SNP has been found to be significantly associated with diabetes [11], hepatitis B virus-related hepatocellular carcinoma [6,10,19,20], inflammatory bowel disease [21], juvenile idiopathic arthritis [22], primary biliary cirrhosis and Crohn's disease [23], severe renal insufficiency in lupus nephritis [8], systemic lupus erythematosus [5] and ulcerative colitis [24]. The rs11889341 STAT4 SNP has been found to be significantly associated with diabetes [11], hepatitis B virus (HBV) infection, HBV-related cirrgisus and hepatocellular carcinoma [23], severe renal insufficiency in lupus nephritis [8], and systemic lupus erythematosus [5]. The rs8179673 STAT4 SNP has been found to be significantly associated with diabetes [11], hepatitis B virus (HBV) infection, HBV-related cirrgisus and hepatocellular carcinoma [23] and systemic lupus erythematosus [5]. The rs7582694 STAT4 SNP has been found to be significantly associated with hepatitis B virus (HBV) infection, HBV-related cirrgisus and hepatocellular carcinoma [23] and severe renal insufficiency in lupus nephritis [8]. The rs7574070 and rs7572482 STAT4 SNPs have been found to be significantly associated with Behcet's disease [18]. The rs7572482 STAT4 SNP is located in the promoter region while the remaining SNPs are located in the large 70 kb intron between exon 2 and 3. The reports listed above indicate that these SNPs are in strong linkage disequilibrium (LD) with each other.
Single nucleotide changes that affect gene expression by impacting gene regulatory sequences such as promoters, enhances, and silencers are known as regulatory SNPs (rSNPs) [25][26][27][28]. A rSNPs within a transcriptional factor binding site (TFBS) can change a transcriptional factor's (TF) ability to bind its TFBS [29][30][31][32] in which case the TF would be unable to effectively regulate its target gene [33][34][35][36][37]. This concept is examined for the above STAT4 rSNPs and their allelic association with TFBS, where computation analyses [38][39][40][41] was used to identify TFBS alterations created by the STAT4 rSNPs. Recent reports have also introduced the concept of modeling of epigenetic modifications to transcriptional factor binding sites in the control of gene expression [42,43]. In this report, the rSNP associations with changes in potential TFBS are discussed with their possible relationship to these diseases in humans.

METHODS
The JASPAR CORE database [44,45] and ConSite [46] were used to identify the potential STAT4  TFs in red differ between the SNP alleles. Where upper case nucleotide designates the 90% conserved BS region and red is the SNP location of the alleles in the TFBS. Below the TFBS is the nucleotide occurrence (%) obtained from the Jaspar Core database. Also listed are the number (#) of binding sites in the gene for the given TF. Note: TFs can bind to more than one nucleotide sequence.                Appendix. Transcriptional factor (TF) discriptions.

AR
The protein functions as a steroid-hormone activated transcription factor. Upon binding the hormone ligand, the receptor dissociates from accessory proteins, translocates into the nucleus, dimerizes, and then stimulates transcription of androgen responsive genes. They are expressed in bone marrow, mammary gland, prostate, testicular and muscle tissues where they exist as dimers coupled to Hsp90 and HMGB proteins. The protein encoded by this gene is a nuclear basic leucine zipper protein that belongs to the AP-1/ATF superfamily of transcription factors. The leucine zipper of this protein mediates dimerization with members of the Jun family of proteins. This protein is thought to be a negative regulator of AP-1/ATF transcriptional events.

Bhlhe40
This gene encodes a basic helix-loop-helix protein expressed in various tissues. The encoded protein can interact with ARNTL or compete for E-box binding sites in the promoter of PER1 and repress CLOCK/ ARNTL's transactivation of PER1. Transcriptional repressor involved in the regulation of the circadian rhythm by negatively regulating the activity of the clock genes and clock-controlled genes.

BRCA1
This gene encodes a nuclear phosphoprotein that plays a role in maintaining genomic stability, and it also acts as a tumor suppressor.

CDX2
This gene is a member of the caudal-related homeobox transcription factor gene family. The encoded protein is a major regulator of intestine-specific genes involved in cell growth an differentiation. major regulator of intestine-specific genes involved in cell growth an differentiation.
CEBPA C/EBP is a DNA-binding protein that recognizes two different motifs: the CCAAT homology common to many promoters and the enhanced core homology common to many enhancers CEBPB Important transcriptional activator regulating the expression of genes involved in immune and inflammatory responses. Binds to regulatory regions of several acute-phase and cytokines genes and probably plays a role in the regulation of acute-phase reaction, inflammation and hemopoiesis.

CRX
The protein encoded by this gene is a photoreceptor-specific transcription factor which plays a role in the differentiation of photoreceptor cells. This homeodomain protein is necessary for the maintenance of normal cone and rod function.

FOXP2
Transcriptional repressor that may play a role in the specification and differentiation of lung epithelium. May also play a role in developing neural, gastrointestinal and cardiovascular tissues.
GATA1 The protein plays an important role in erythroid development by regulating the switch of fetal hemoglobin to adult hemoglobin.

GATA2
A member of the GATA family of zinc-finger transcription factors that are named for the consensus nucleotide sequence they bind in the promoter regions of target genes and play an essential role in regulating transcription of genes involved in the development and proliferation of hematopoietic and endocrine cell lineages.
GATA3 Plays an important role in endothelial cell biology.

GATA4
This protein is thought to regulate genes involved in embryogenesis and in myocardial differentiation and function. Promotes cardiac myocyte enlargement.
HIF1a: ARNT HIF1 is a homodimeric basic helix-loop-helix structure composed of HIF1a, the alpha subunit, and the aryl hydrocarbon receptor nuclear translocator (Arnt), the beta subunit. The protein encoded by HIF1 is a Per-Arnt-Sim (PAS) transcription factor found in mammalian cells growing at low oxygen concentrations. It plays an essential role in cellular and systemic responses to hypoxia.

TFs TF discription
Appendix

HNF1A
Transcriptional activator that regulates the tissue specific expression of multiple genes, especially in pancreatic islet cells and in liver.

HNF4a
The encoded protein controls the expression of several genes, including hepatocyte nuclear factor 1 alpha, a transcription factor which regulates the expression of several hepatic genes

HNF4g
Transcription factor. Has a lower transcription activation potential than HNF4-alpha HOXA5 Sequence-specific transcription factor which is part of a developmental regulatory system that provides cells with specific positional identities on the anterior-posterior axis.

JUN (var.2)
This gene is the putative transforming gene of avian sarcoma virus 17. It encodes a protein which is highly similar to the viral protein, and which interacts directly with specific target DNA sequences to regulate gene expression. Transcription factor that binds to GC box promoter elements. Activates transcription of genes.

LHX3
This gene encodes a member a large protein family which carry the LIM domain, a unique cysteine-rich zinc-binding domain. The encoded protein is a transcription factor that is required for pituitary development and motor neuron specification.

MAFB
The encoded nuclear protein represses ETS1-mediated transcription of erythroid-specific genes in myeloid cells. This protein plays an essential role in the regulation of hematopoiesis and may play a role in tumorigenesis.

MAFF
The protein encoded by this gene is a basic leucine zipper (bZIP) transcription factor that lacks a transactivation domain. Interacts with the upstream promoter region of the oxytocin receptor gene.

MAFK
Since they lack a putative transactivation domain, the small Mafs behave as transcriptional repressors when they dimerize among themselves. they seem to serve as transcriptional activators by dimerizing with other (usually larger) basic-zipper proteins and recruiting them to specific DNA-binding sites. Small Maf proteinS heterodimerize with Fos and may act as competitive repressors of the NF-E2 transcription factor.

MAX
The protein encoded by this gene is a member of the basic helix-loop-helix leucine zipper (bHLHZ) family of transcription factors

MEF2C
Transcription activator which binds specifically to the MEF2 element present in the regulatory regions of development. many muscle-specific genes. Controls cardiac morphogenesis and myogenesis, and is also involved in vascular development.

TFs TF discription
Appendix

MZF1_5-13
Binds to target promoter DNA and functions as trancription regulator. May be one regulator of transcriptional events during hemopoietic development. Isoforms of this protein have been shown to exist at protein level.

NFE2L1:MAFG
Nuclear factor erythroid 2-related factor (Nrf2) coordinates the up-regulation of cytoprotective genes via the antioxidant response element (ARE). MafG is a ubiquitously expressed small maf protein that is nvolved in cell differentiation of erythrocytes. It dimerizes with P45 NF-E2 protein and activates expression of a and b-globin.

NKX2-5
This gene encodes a member of the NK family of homeobox-containing proteins. Transcriptional repressor that acts as a negative regulator of chondrocyte maturation.

NKX3-1
This gene encodes a homeobox-containing transcription factor. This transcription factor functions as a negative regulator of epithelial cell growth in prostate tissue. Nr1h3::Rxra The protein encoded by this gene belongs to the NR1 subfamily of the nuclear receptor superfamily. The NR1 family members are key regulators of macrophage function, controlling transcriptional programs involved in lipid homeostasis and inflammation. This protein is highly expressed in visceral organs, including liver, kidney and intestine. It forms a heterodimer with retinoid X receptor (RXR), and regulates expression of target genes containing retinoid response elements. Studies in mice lacking this gene suggest that it may play an important role in the regulation of cholesterol homeostasis.

NR3C1
Glucocorticoids regulate carbohydrate, protein and fat metabolism, modulate immune responses through supression of chemokine and cytokine production and have critical roles in constitutive activity of the CNS, digestive, hematopoietic, renal and reproductive systems.

PAX2
Probable transcription factor that may have a role in kidney cell differentiation.

PDX1
Activates insulin, somatostatin, glucokinase, islet amyloid polypeptide and glucose transporter type 2 gene transcription. Particularly involved in glucose-dependent regulation of insulin gene transcription.

POU5F1::SOX2
This gene encodes a transcription factor containing a POU homeodomain that plays a key role in embryonic development and stem cell pluripotency. Aberrant expression of this gene in adult tissues is associated with tumorigenesis. Forms a trimeric complex with SOX2 on DNA and controls the expression of a number of genes involved in embryonic development such as YES1, FGF4, UTF1 and ZFP206.

PRRX2
The DNA-associated protein encoded by this gene is a member of the paired family of homeobox proteins. Expression is localized to proliferating fetal fibroblasts and the developing dermal layer, with downregulated expression in adult skin.

RFX1
This gene is a member of the regulatory factor X gene family, which encodes transcription factors that contain a highly-conserved winged helix DNA binding domain. The protein encoded by this gene is structurally related to regulatory factors X2, X3, X4, and X5. Regulatory factor essential for MHC class II genes expression. Binds to the X boxes of MHC class II genes.

TFs TF discription
Appendix Continued

MZF1_1-4
Binds to target promoter DNA and functions as trancription regulator. May be one regulator of transcriptional events during hemopoietic development. Isoforms of this protein have been shown to exist at protein level.

MZF1_5-13
Binds to target promoter DNA and functions as trancription regulator. May be one regulator of transcriptional events during hemopoietic development. Isoforms of this protein have been shown to exist at protein level.

NFE2L1:MAFG
Nuclear factor erythroid 2-related factor (Nrf2) coordinates the up-regulation of cytoprotective genes via the antioxidant response element (ARE). MafG is a ubiquitously expressed small maf protein that is nvolved in cell differentiation of erythrocytes. It dimerizes with P45 NF-E2 protein and activates expression of a and b-globin.

NKX2-5
This gene encodes a member of the NK family of homeobox-containing proteins. Transcriptional repressor that acts as a negative regulator of chondrocyte maturation.

NKX3-1
This gene encodes a homeobox-containing transcription factor. This transcription factor functions as a negative regulator of epithelial cell growth in prostate tissue. Nr1h3::Rxra The protein encoded by this gene belongs to the NR1 subfamily of the nuclear receptor superfamily. The NR1 family members are key regulators of macrophage function, controlling transcriptional programs involved in lipid homeostasis and inflammation. This protein is highly expressed in visceral organs, including liver, kidney and intestine. It forms a heterodimer with retinoid X receptor (RXR), and regulates expression of target genes containing retinoid response elements. Studies in mice lacking this gene suggest that it may play an important role in the regulation of cholesterol homeostasis.

NR3C1
Glucocorticoids regulate carbohydrate, protein and fat metabolism, modulate immune responses through supression of chemokine and cytokine production and have critical roles in constitutive activity of the CNS, digestive, hematopoietic, renal and reproductive systems.

PAX2
Probable transcription factor that may have a role in kidney cell differentiation.

PDX1
Activates insulin, somatostatin, glucokinase, islet amyloid polypeptide and glucose transporter type 2 gene transcription. Particularly involved in glucose-dependent regulation of insulin gene transcription.

POU5F1::SOX2
This gene encodes a transcription factor containing a POU homeodomain that plays a key role in embryonic development and stem cell pluripotency. Aberrant expression of this gene in adult tissues is associated with tumorigenesis. Forms a trimeric complex with SOX2 on DNA and controls the expression of a number of genes involved in embryonic development such as YES1, FGF4, UTF1 and ZFP206.

PRRX2
The DNA-associated protein encoded by this gene is a member of the paired family of homeobox proteins. Expression is localized to proliferating fetal fibroblasts and the developing dermal layer, with downregulated expression in adult skin.

RFX1
This gene is a member of the regulatory factor X gene family, which encodes transcription factors that contain a highly-conserved winged helix DNA binding domain. The protein encoded by this gene is structurally related to regulatory factors X2, X3, X4, and X5. Regulatory factor essential for MHC class II genes expression. Binds to the X boxes of MHC class II genes.

TFs TF discription
Appendix Continued

RFX5
Activates transcription from class II MHC promoters. Recognizes X-boxes.

RORA_1
The protein encoded by this gene is a member of the NR1 subfamily of nuclear hormone receptors. Orphan nuclear receptor. Binds DNA as a monomer to hormone response elements (HRE) containing a single core motif half-site preceded by a short A-T-rich sequence.

RUNX2
Transcription factor involved in osteoblastic differentiation and skeletal morphogenesis. Essential for the maturation of osteoblasts and both intramembranous and endochondral ossification.

RXRa
Retinoid X receptors (RXRs) and retinoic acid receptors (RARs), are nuclear receptors that mediate the biological effects of retinoids by their involvement in retinoic acid-mediated gene activation. The encoded protein may act as a transcriptional regulator after forming a protein complex with other proteins and may play a role in chondrogenesis.

SOX6
The encoded protein is a transcriptional activator that is required for normal development of the central nervous system, chondrogenesis and maintenance of cardiac and skeletal muscle cells.

SOX9
Plays an important role in the normal skeletal development. May regulate the expression of other genes involved in chondrogenesis by acting as a transcription factor for these genes SOX10 This gene encodes a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate.
SOX17 Acts as transcription regulator that binds target promoter DNA and bends the DNA.

SP1
Can activate or repress transcription in response to physiological and pathological stimuli. Regulates the expression of a large number of genes involved in a variety of processes such as cell growth, apoptosis, differentiation and immune responses.

SPI1
This gene encodes an ETS-domain transcription factor that activates gene expression during myeloid and B-lymphoid cell development SPIB The protein encoded by this gene is a transcriptional activator that binds to the PU-box (5'-GAGGAA-3') and acts as a lymphoid-specific enhancer.

SRF
This gene encodes a ubiquitous nuclear protein that stimulates both cell proliferation and differentiation. This protein binds to the serum response element (SRE) in the promoter region of target genes. Required for cardiac differentiation and maturation.

TFs TF discription
Appendix Continued

SREBF2
This gene encodes a member of the a ubiquitously expressed transcription factor that controls cholesterol homeostasis by regulating transcription of sterol-regulated genes. The encoded protein contains a basic helix-loophelix-leucine zipper (bHLH-Zip) domain and binds the sterol regulatory element 1 motif.

SRY
Transcriptional regulator that controls a genetic switch in male development. It is necessary and sufficient for initiating male sex determination by directing the development of supporting cell precursors

STAT1
The protein encoded by this gene is a member of the STAT protein family. In response to cytokines and growth factors, STAT family members are phosphorylated by the receptor associated kinases, and then form homo-or heterodimers that translocate to the cell nucleus where they act as transcription activators. This protein can be activated by various ligands including interferon-alpha, interferon-gamma, EGF, PDGF and IL6. This protein mediates the expression of a variety of genes, which is thought to be important for cell viability in response to different cell stimuli and pathogens.

STAT3
Signal transducer and transcription activator that mediates cellular responses to interleukins, KITLG/SCF and other growth factors

STAT4
Carries out a dual function: signal transduction and activation of transcription. Involved in IL12 signaling This protein is essential for mediating responses to IL12 in lymphocytes, and regulating the differentiation of arthritis. T helper cells. Mutations in this gene may be associated with systemic lupus erythematosus and rheumatoid arthritis.
STAT5A:STAT5B Carries out a dual function: signal transduction and activation of transcription. Regulates the expression of milk proteins during lactation.

STAT6
This protein plays a central role in exerting IL4 mediated biological responses. It is found to induce the expression of BCL2L1/BCL-X(L), which is responsible for the anti-apoptotic activity of IL4. Carries out a dual function: signal t ransduction and activation of transcription. Involved in IL4/interleukin-4-and IL3/interleukin-3-mediated signaling.

T
The protein encoded by this gene is an embryonic nuclear transcription factor that binds to a specific DNA element, the palindromic T-site. It binds through a region in its N-terminus, called the T-box, and effects transcription of genes required for mesoderm formation and differentiation.

TBP
General transcription factor that functions at the core of the DNA-binding multiprotein factor TFIID. Binding of TFIID to the TATA box is the initial transcriptional step of the pre-initiation complex (PIC), playing a role in the activation of eukaryotic genes transcribed by RNA polymerase II.

TFAP2C
Sequence-specific DNA-binding protein that interacts with inducible viral and cellular enhancer elements to regulate transcription of selected genes. AP-2 factors bind to the consensus sequence 5'-GCCNNNGGC-3' and activate genes involved in a large spectrum of important biological functions including proper eye, face, body wall, limb and neural tube development.

THAP1
DNA-binding transcription regulator that regulates endothelial cell proliferation and G1/S cell-cycle progression.

ZBTB33
This gene encodes a transcriptional regulator with bimodal DNA-binding specificity, which binds to methylated CGCG and also to the non-methylated consensus KAISO-binding site TCCTGCNA. The protein contains an N-terminal POZ/ BTB domain and 3 C-terminal zinc finger motifs. It recruits the N-CoR repressor complex to promote histone deacetylation and the formation of repressive chromatin structures in target gene promoters. It may contribute to the repression of target genes of the Wnt signaling pathway, and may also activate transcription of a subset of target genes by the recruitment of catenin delta-2 (CTNND2).

ZNF263
Might play an important role in basic cellular processes as a transcriptional repressor.

ZNF354C
May function as a transcription repressor.

TFs TF discription
Appendix Continued can bind their respective DNA sequence either above (+) or below (-) the duplex (cf. Table 2). The rs11889341 rSNP common STAT4-C allele is found in each of these TFBS. As shown, this rSNP is located in the 70 kb intron between exon 2 and 3 of the STAT4 gene. Also included with the potential TFBS is their % sequence homology to the duplex.  Human diseases or conditions can be associated with rSNPs of the STAT4 gene as illustrated above. What a change in the rSNP alleles can do, is to alter the DNA landscape around the SNP for potential TFs to attach and regulate a gene. As an example, the potential TFBS associated with the rs7574865 rSNP STAT4-T allele from Table 2 are illustrated in Figure 1 as well as the rs11889341 common rSNP STAT4-C allele illustrated in Figure 2. As can be seen in Table 2, these potential TFBS change when an individual carries the alternate allele. The importance of this has been illustrated in  Table 2.

COMPETING INTERESTS
Author has declared that no competing interests exist.