Glycan Statistics in the PDB
As of December 2017, 11,844 out of the 137,178 PDB entries (8.63%) have at least one glycan chain. 11,626 out of the 11,844 PDB entries (98.16%) were solved by X-ray crystallography and the others by solution NMR, electron microscopy, fiber diffraction, powder diffraction, neutron diffraction, solid-state NMR, and theoretical models (Figure. 1A).
Among 49,539 glycan chains, there are 28,364 N-linked glycan chains (57.26%), 1,400 O-linked glycans (2.83%), 19,660 free glycan ligands (39.69%), and 115 glycan chains covalently linked to Asp and Glu residues (0.23%) (Figure. 1B). N-acetyl-D-glucosamine (GlcNAc) is the most abundant monosaccharide in the PDB (38.991; 49.78%), followed by D-mannose (15,934; 17.68%), D-glucose (15,825; 17.56%) (Figure. 1C).
Figure 1D shows glycan chemical modifications with their number frequencies more than 20 in the PDB. Note that Figure 1D does not include the functional groups of sugar residue types such as N-acetylation at C2 of GlcNAc or carboxylate at C6 of GlcA. Sulfation, phosphorylation, and O-methylation are the most frequent modifications in the PDB. Sulfation is frequently found at C2 and C6 of sugar residues within glycosaminoglycans (with N-linked sulfate at C2); at C6 of glyco lipids such as glycosyl ceramide and sulfoquinovosyl diacylglycerol; and in several hexopyranoses like N-acetyl-D-glucosamine-6-sulfate and O3-sulfonyl galactose. Phosphorylation is frequently found at C1, C4, and C6 of hexopyranoses, C2 and C6 of fructofuranose, C1 and C5 of ribose, C7 of mannoheptose, and C8 of Kdo in lipopolysaccharides (LPS). O-methylation has been found at C2 of Neu5Ac and Kdo, and carboxylate at C5 of GlcA or IdoA, and general hexopyranoses.
Figure 1E shows the number frequencies of glycolipids in the PDB. Detergents (DET) are most prevalent in the PDB glycolipids. They usually consist of monosaccharide or disaccharide with octyl or dodecyl acyl chain. Diacylglycerols (DAG) are found mostly in the photosystem like bacterial rhodopsin as forms of monogalactosyl diacylglycerol, digalactosyl diacylglycerol, and sulfoquinovosyl diacylglycerol. 2,3-di-phytanylglycerols (DPHG) have similar chemical compositions with DAG and are also found in the photosystems. Ceramides (CER) are found in glycosphingolipids (e.g., cerebrosides and gangliosides such as GM1). Lipid A (as part of LPS) anchors LPS in the outer membrane of the Gram-negative bacteria and possesses an archetypal structure of a β-(1→6)-linked D-GlcN disaccharide that is acylated with four to eight fatty acids of different lengths, and there exist complex chemical substitutions in lipid A from certain bacterial species.