Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... ·...

37
Laboratoire d’Ingénierie des Systèmes Biologiques et des Procédés UMR INSA/CNRS 5504 – UMR INSA/INRA 792 LISBP/INSA – 135 Avenue de Rangueil – 31077 Toulouse cedex 4 (France) Tél. : + 33 (0) 5 61 55 94 01 – Fax : + 33 (0) 5 61 55 94 00 – Mél : [email protected] www.lisbp.fr CSC 2020 PhD project: Engineering of oligosaccharide transporters Supervisor’s Name : Dr Gabrielle Potocki-Veronese Laboratory: LISBP, Toulouse, France Project description Glycan catabolism is a crucial function, both for natural and artificial microbial ecosystems, and for the functioning of chassis strains used in synthetic biology. In bacteria, glycan utilization pathways involve complex machineries of glycan sensing, binding, transport and degradation. If carbohydrate active enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively easy to characterize, the specificity of transporters is much more difficult to decipher, due to their transmembrane location, the multiplicity of transport systems in native strains, and the lack of genetic tools for many species (especially the non-cultured organisms which make up the major part of microbial ecosystems). However, transporters represent crucial biotechnological tools, and are important determinants of the metabolic ability of bacteria. In the past few years, the LISBP demonstrated that molecular characterization of transporters issued from uncultured bacteria (including their transmembrane components) can be perfomed in E. coli, and developed several new technologies to screen and characterize their specificity. This PhD project will aim at engineering the specificity of glycoside transporters previously identified by the LISBP by functional metagenomics of the human and bovine gut microbiomes. Combinatorial protein engineering approaches will be used in order to: - analyze the structure-function relationships of the different proteic elements of transporters involved in the degradation of host and dietary glycans in gut microbiomes - design new artificial channels capable of transporting oligosaccharides of complex structures for synthetic biology The project is based on the expertise of the team in protein engineering, ultra-high throughput functional screening, and in structural biology. It targets various applications for synthetic biology, as for the control of microbial ecosystems functioning, including the human gut microbiota in which the glycan-mediated interrelationships between bacteria and the host play key roles for human health.

Transcript of Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... ·...

Page 1: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Laboratoire d’Ingénierie des Systèmes

Biologiques et des Procédés

UMR INSA/CNRS 5504 – UMR INSA/INRA 792

LISBP/INSA – 135 Avenue de Rangueil – 31077 Toulouse cedex 4 (France)

Tél. : + 33 (0) 5 61 55 94 01 – Fax : + 33 (0) 5 61 55 94 00 – Mél : [email protected] www.lisbp.fr

CSC 2020

PhD project: Engineering of oligosaccharide transporters

Supervisor’s Name : Dr Gabrielle Potocki-Veronese

Laboratory: LISBP, Toulouse, France

Project description

Glycan catabolism is a crucial function, both for natural and artificial microbial ecosystems, and for the

functioning of chassis strains used in synthetic biology. In bacteria, glycan utilization pathways involve

complex machineries of glycan sensing, binding, transport and degradation. If carbohydrate active

enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively easy to

characterize, the specificity of transporters is much more difficult to decipher, due to their transmembrane

location, the multiplicity of transport systems in native strains, and the lack of genetic tools for many

species (especially the non-cultured organisms which make up the major part of microbial ecosystems).

However, transporters represent crucial biotechnological tools, and are important determinants of the

metabolic ability of bacteria. In the past few years, the LISBP demonstrated that molecular

characterization of transporters issued from uncultured bacteria (including their transmembrane

components) can be perfomed in E. coli, and developed several new technologies to screen and

characterize their specificity.

This PhD project will aim at engineering the specificity of glycoside transporters previously identified by

the LISBP by functional metagenomics of the human and bovine gut microbiomes. Combinatorial protein

engineering approaches will be used in order to:

- analyze the structure-function relationships of the different proteic elements of transporters involved in

the degradation of host and dietary glycans in gut microbiomes

- design new artificial channels capable of transporting oligosaccharides of complex structures for

synthetic biology

The project is based on the expertise of the team in protein engineering, ultra-high throughput functional

screening, and in structural biology. It targets various applications for synthetic biology, as for the control

of microbial ecosystems functioning, including the human gut microbiota in which the glycan-mediated

interrelationships between bacteria and the host play key roles for human health.

Page 2: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

10.1101/gr.108332.110Access the most recent version at doi: 2010 20: 1605-1612 originally published online September 14, 2010Genome Res.

Lena Tasse, Juliette Bercovici, Sandra Pizzut-Serin, et al. dietary fiber catabolic enzymesFunctional metagenomics to mine the human gut microbiome for

MaterialSupplemental http://genome.cshlp.org/content/suppl/2010/08/09/gr.108332.110.DC1.html

Referenceshttp://genome.cshlp.org/content/20/11/1605.full.html#ref-list-1This article cites 53 articles, 22 of which can be accessed free at:

serviceEmail alerting

click heretop right corner of the article orReceive free email alerts when new articles cite this article - sign up in the box at the

http://genome.cshlp.org/subscriptions go to: Genome ResearchTo subscribe to

Copyright © 2010 by Cold Spring Harbor Laboratory Press

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 3: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Method

Functional metagenomics to mine the human gutmicrobiome for dietary fiber catabolic enzymes

Lena Tasse,1,2,7 Juliette Bercovici,1,2,7 Sandra Pizzut-Serin,1,2 Patrick Robe,3 Julien Tap,4

Christophe Klopp,5 Brandi L. Cantarel,6 Pedro M. Coutinho,6 Bernard Henrissat,6

Marion Leclerc,4 Joel Dore,4 Pierre Monsan,1,2 Magali Remaud-Simeon,1,2

and Gabrielle Potocki-Veronese1,2,8

1Universite de Toulouse, INSA, UPS, INP, LISBP, F-31077 Toulouse, France; 2UMR5504, UMR792 Ingenierie des Systemes Biologiques

et des Procedes, CNRS, INRA, F-31400 Toulouse, France; 3LibraGen S.A., F-31400 Toulouse, France; 4INRA UEPSD, bat 405, Domaine

de Vilvert, F-78352 Jouy en Josas Cedex, France; 5Plateforme Bio-informatique Toulouse Genopole, UBIA INRA, BP 52627, F-31326

Castanet-Tolosan Cedex, France; 6Architecture et Fonction desMacromolecules Biologiques, UMR6098, CNRS, Universites Aix-Marseille

I & II, F-13288 Marseille, France

The human gut microbiome is a complex ecosystem composed mainly of uncultured bacteria. It plays an essential role in

the catabolism of dietary fibers, the part of plant material in our diet that is not metabolized in the upper digestive tract,

because the human genome does not encode adequate carbohydrate active enzymes (CAZymes). We describe a multi-step

functionally based approach to guide the in-depth pyrosequencing of specific regions of the human gut metagenome

encoding the CAZymes involved in dietary fiber breakdown. High-throughput functional screens were first applied to

a library covering 5.4 3 109 bp of metagenomic DNA, allowing the isolation of 310 clones showing beta-glucanase,

hemicellulase, galactanase, amylase, or pectinase activities. Based on the results of refined secondary screens, sequencing

efforts were reduced to 0.84 Mb of nonredundant metagenomic DNA, corresponding to 26 clones that were particularly

efficient for the degradation of raw plant polysaccharides. Seventy-three CAZymes from 35 different families were dis-

covered. This corresponds to a fivefold target-gene enrichment compared to random sequencing of the human gut

metagenome. Thirty-three of these CAZy encoding genes are highly homologous to prevalent genes found in the gut

microbiome of at least 20 individuals for whose metagenomic data are available. Moreover, 18 multigenic clusters encoding

complementary enzyme activities for plant cell wall degradation were also identified. Gene taxonomic assignment is

consistent with horizontal gene transfer events in dominant gut species and provides new insights into the human gut

functional trophic chain.

[Supplemental material is available online at http://www.genome.org. The sequence data from this study have been

submitted to GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) under accession nos. GU942928–GU942942 and

GU942944–GU942954.]

The human intestinal microbiome is the dense and complex eco-

system that resides in the distal part of our digestive tract. Its role

in metabolizing dietary constituents (Sonnenburg et al. 2005;

Flint et al. 2008; Ley et al. 2008) and in protecting the host

against pathogens (Rakoff-Nahoumet al. 2004) is crucial to human

health (Macdonald and Monteleone 2005; McGarr et al. 2005;

Manichanh et al. 2006; Turnbaugh and Gordon 2009). It is mainly

composed of commensal bacteria from the Bacteroidetes, Firm-

icutes, Proteobacteria, and Actinobacteria phyla (five), and of sev-

eral archaeal and eukaryotic species.With up to 1012 cells per gram

of feces, the bacterial abundance is estimated to reach 1000 oper-

ational taxonomic units (OTUs) per individual, 70% to 80% of the

most dominant ones being subject-specific (Zoetendal et al. 1998;

Tap et al. 2009). However, only 20% of the bacterial species have

been successfully cultured so far (Eckburg et al. 2005). Large-scale

analyses of genomic and metagenomic sequences have provided

gene catalogs and statistical evidence on protein families involved

in the predominant functions of the human gut microbiome (Gill

et al. 2006; Kurokawa et al. 2007; Flint et al. 2008; Turnbaugh et al.

2009; Qin et al. 2010), among which the catabolism of dietary fi-

bers is of particular interest in humannutrition andhealth. Dietary

fibers are the components of vegetables, cereals, leguminous seeds,

and fruits that are not digested in the stomach or in the small in-

testine, but are fermented in the colon by the gut microbiome

and/or excreted in feces (Grabitske and Slavin 2008). Chemically,

dietary fibers are mainly composed of complex plant cell wall

polysaccharides and their associated lignin (Selvendran 1984),

along with storage polysaccharides such as fructans and resistant

starch (Institute of Medicine 2005). Dietary fibers have been

identified as a strong positive dietary factor in the prevention

of obesity, diabetes, and cardiovascular diseases (World Health

Organization 2003). Because of the wide structural diversity of die-

tary fibers, the human gut bacteria produce a huge panel of car-

bohydrate active enzymes (CAZymes), with widely different sub-

strate specificities, to degrade these compounds intometabolizable

monosaccharides and disaccharides. The functions and the evo-

lutionary relationships of CAZyme-encoding genes of the human

gut microbiome are being extensively studied through functional

and structural genomics investigations (Flint et al. 2008; Lozupone

7These authors contributed equally to this work.8Corresponding author.E-mail [email protected]; fax 33-5-61-55-94-00.Article published online before print. Article and publication date are athttp://www.genome.org/cgi/doi/10.1101/gr.108332.110.

20:1605–1612 Ó 2010 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/10; www.genome.org Genome Research 1605www.genome.org

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 4: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

et al. 2008; Mahowald et al. 2009; Martens et al. 2009), which are

nevertheless restricted to cultivated bacterial species. CAZyme di-

versity has also been described in three metagenomics studies fo-

cused on this microbiome (Gill et al. 2006; Turnbaugh et al. 2009,

2010), and these revealed the presence of at least 81 families of

glycoside-hydrolases, making the human gut metagenome one of

the richest source of CAZymes (Li et al. 2009). However, the proof

of function of annotated genes issued from metagenomes still

constitutes a goal for enzyme discovery. This can be addressed by

functional screening of metagenomic libraries, in order to retrieve

genes of interest. Numerous studies have provided conclusive ev-

idence on the potential of such an approach for the identification

of novel glycoside-hydrolases from various ecosystems such as soil

(Rondon et al. 2000; Richardson et al. 2002; Voget et al. 2003; Pang

et al. 2009), lakes (Rees et al. 2003), hot springs (Tang et al. 2006,

2008), rumen (Ferrer et al. 2005; Guo et al. 2008; Liu et al. 2008;

Duan et al. 2009), rabbit (Feng et al. 2007), and insect guts

(Brennan et al. 2004; for review, see Ferrer et al. 2009; Li et al. 2009;

Simon and Daniel 2009; Uchiyama and Miyazaki 2009). In all

cases, the identification of the gene responsible for the screened

activity was carried out by sequencing only a few kilobases of

metagenomic DNA. Collectively these studies have established an

experimental proof of function for 35 glycoside hydrolases (from

eight families) issued from metagenomes (data from the CAZy

database; http://www.cazy.org/), a number that is very small con-

sidering the known CAZy diversity. Here, we examined the po-

tential of high-throughput functional screening of large insert li-

braries to guide in-depth pyrosequencing of specific regions of the

human gut metagenome that encode the enzymatic machinery

involved in dietary fiber catabolism.

Results and Discussion

Function-based strategy to target novel CAZymes

The overall strategy (Fig. 1) relies on the screening of a large meta-

genomic library issued from the feces of a healthy volunteer adult

individual who followed a fiber-rich diet, to easily isolate genes

encoding enzymes that were able to break down raw and mostly

insoluble plant polysaccharides. First, the library was screened at

a throughput of 200,000 clones assayed per week and per activity,

using both commercial and home-made polysaccharides (Supple-

mental Table S1). In the secondary step, all positive clones were

screened again using a panel of 15 raw and chemically modified

polysaccharides of various structures (Supplemental Table S1), to

distinguish different enzyme specificities toward glycosidic link-

ages within clones that were able to degrade the same polysac-

charide in the primary screens. In parallel, enzymepHdependency

and thermostability were assayed. Then, in-depth pyrosequencing

of the metagenomic DNA insert from the most interesting clones

was carried out. To identify the enzymes responsible for plant

polysaccharide breakdown and their microbial origin, sequence

analysis was focused on taxonomic annotation of the DNA inserts

and CAZyme-encoding gene annotation.

Multi-step functional screening

The initial library consisted of 156,000 Escherichia coli fosmid

clones, covering in total 5.463 109 bp of metagenomic DNA, each

clone comprising a 30–40-kb DNA insert. The library was screened

for the ability to hydrolyze five different polysaccharides, namely,

beta-glucan, xylan, beta-(1-4)-galactan, pectin, and amylose. In

total, 704,000 tests were performed, and 310 positive clones were

obtained. Hit frequency varied from 0.05% to 0.8% (Supplemental

Table S1). No clone degraded more than one of the substrates in-

cluded in the primary screens. Secondary screening results allowed

the clustering of the 310 positive clones on the basis of their ability

to break down various polysaccharide structures (Supplemental

Table S2). One-hundred-and-forty-two clones were able to degrade

only the polysaccharide used in the primary screen, while the

others could also cleave polysaccharides carrying modifications in

the main chain and in the various side chains. Besides, the en-

zymes’ ability to work at extreme pH and high temperature was

investigated for their potential use in industrial process. Enzyme

stability is related to tight protein structural features, and not only

to the thermotolerance of the organism they are issued from. Here,

eight of the 310 positive clones maintained enzyme activity at

pH 4 and/or 9, and three were still active after a 55°C heat shock.

Even issued from an ecosystem regulated at 37°C, a total of 26

clones were selected from the two screening steps either for their

efficiency of degradation of particularly resistant substrates, like

native heteroxylans, beta-glucans, or resistant starches, and/or

for their stability at various pH values or high temperatures. The

percentage of clones being sequenced was thus not related to hit

frequency.

Pyrosequencing and gene prediction

The third step of our work consisted in pyrosequencing the inserts

from the 26 selected positive clones. Read assembly resulted in 27

large contigs obtained with a mean coverage sequencing depth of

443. Two large contigs were found for clone 4. Surprisingly, three

cases of partial sequence redundancy occurred for beta-glucanase,

xylanase, and galactanase active clones, respectively. Excluding

the vector sequences, these 27 large contigs, sizing between 8.3

and 43.8 kb, included 843,256 nt of nonredundant metagenomic

DNA. The high sequencing depth allowed accurate gene pre-

diction, gene organization, and taxonomic assignment. The to-

tal number of predicted genes sizing at least 60 nt was 665 (622Figure 1. Overall strategy based on the use of multi-step functionalscreens for gene discovery from metagenomic sequences.

1606 Genome Researchwww.genome.org

Tasse et al.

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 5: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

complete genes). Among the 622 complete protein sequences

reported here, 349 were assigned to clusters of orthologous groups

of proteins (COGs). The distribution pattern of COG-assigned

proteins (Fig. 2; Supplemental Table S4) highlights the dominance

of theG cluster, corresponding to proteins predicted to be involved

in carbohydrate transport and metabolism. The G cluster size was

found to contain 23% of COG-assigned proteins, which is drasti-

cally higher than what was previously obtained from random se-

quencing of the human gut metagenome (Kurokawa et al. 2007;

Turnbaugh et al. 2009; Qin et al. 2010). This demonstrates the

power of the functional screening steps to isolate large meta-

genomic DNA fragments that are enriched in genes encoding the

enzymatic machinery for dietary fiber digestion.

Taxonomic assignment of metagenomic DNA

To obtain new insights into the relationships existing between

bacteria taxonomy and their role in fiber metabolization, the

bacterial origin of the metagenomic DNA inserts was predicted on

the basis of sequence homology with the protein sequences con-

tained in the nonredundant (NR) protein sequence database of the

NCBI. The amount of assignable and unassignable metagenomic

DNA fragments is biased by the number of bacterial genome se-

quences present in the NR database, and it is related to the highly

stringent criteria (Kurokawa et al. 2007) that we used to avoid false

taxonomic assignment. For all clones, themetagenomic sequences

contained some genes encoding proteins without any high se-

quence identity with any known proteins (Supplemental Fig. S1).

We thus conclude that they originate frommicroorganisms whose

genome sequence is not (or not yet) available. Moreover, using the

chosen criteria, 13 large contigs were nonassignable, one was as-

signed to a bacterial order, seven were assigned to one bacterial

genus, and six at a bacterial species level (Fig. 3). Among them,

nine corresponded to bacteria from the Bacteroidetes phylum and

five to Gram-positive bacteria. This indicates that a significant

number of genes originating from these bacteria were successfully

expressed and produced functional proteins, even if some ex-

pression bias probably occurred by using E. coli as the recombinant

host for functional screening (Gabor et al. 2004; Chen et al. 2007).

Indeed, it appears that some genes that were correctly expressed in

E. coli (based on the transposon mutagenesis results) were located

up to 30 kb from any possible upstream vector-borne promoters.

These genes came, among others, from contigs assigned to Bac-

teroides (i.e., prot ID ADD61481, clone 14, 30 kb; ADD61507, clone

16, 14 kb) and theGram-positive Eubacterium (ADD61840, clone 3,

20 kb) (Supplemental Table S3). In the E. coli host, transcription

of these genes was probably initiated from the native Bacteroides

and Eubacterium promoters.

Additionally, we compared the taxonomic assignment of

contigs with that of the total metagenomic DNA used for con-

structing the library (based on 4530 16S rDNA gene sequences)

(Supplemental Fig. S2). The total bacterial diversity of the origi-

nating sample, estimated by Chao index on 16S rDNA library data

sets (Supplemental Fig. S3), is consistent with the average diversity

in fecal samples from healthy individuals, cumulatively reaching

9940 OTUs for 17 individuals (Tap et al. 2009). In the initial sam-

ple, the most abundant 16S rDNA sequences were assigned to five

OTUs: two Eubacterium rectale (1207 sequences), Ruminococcus sp.

(710 sequences), Bacteroides sp. (367 sequences), and Ruminococcus

bromii (125 sequences). Surprisingly, none of the bacterial species

assigned to the contigs corresponded to these five OTUs. In addi-

tion, based on 16S rDNA sequencing, some of the metagenomic

fragments originated from species representing <1% of the initial

sample: One 16S rDNA sequence only corresponded to Bacteroides

stercoris, Bacteroides thetaiotaomicron, and Bacteroides uniformis,

while 29 16S rDNA sequences corresponded to Bifidobacterium

longum. Even if some cloning (Temperton et al. 2009) and ex-

pression (Gabor et al. 2004; Chen et al. 2007) biases may have

occurred, and considering only taxonomic assignment to the ge-

nus level, it can be concluded that the present functionally guided

strategy allows the isolation of DNA fragments from bacteria rep-

resenting only a few percent of the dominant gut bacteria (like

Bifidobacteria), provided that one is capable of exploring a suffi-

ciently large sequence space.

Because the frequent occurrence of horizontal gene transfer

(HGT) is thought to help gut bacteria to share their advantages

when facing common challenges (Roberts et al. 2008), taxonomic

assignment based on sequence identity may be inconsistent with

that based on 16S rDNA. It has been shown previously that the

human gut metagenome is rich in conjugative transposons, inte-

grases, and recombinases (Jones and Marchesi 2007; Kurokawa

et al. 2007; Qu et al. 2008). Based on the data available in 2008,

Tamames and Moya (2008) predicted that 1%–2.5% of contigs of

the human gut metagenome contain probable HGT events. More-

over, the analysis of 36 bacterial gut genomes revealed that

CAZyme convergence was largely due to HGT (Lozupone et al.

2008). Here, based on the analysis of only 0.84 Mb of nonredun-

dant metagenomic sequences, we identified 11 genes predicted to

Figure 2. Distribution pattern of COG-assigned proteins. The genes notassignable to any COGs are not shown in this figure. (C) Energy pro-duction and conversion. (D) Cell cycle control, mitosis, and meiosis. (E)Amino acid transport and metabolism. (F) Nucleotide transport andmetabolism. (G) Carbohydrate transport and metabolism. (H) Coenzymetransport and metabolism. (I) Lipid transport and metabolism. (J) Trans-lation. (K) Transcription. (L) Replication, recombination, and repair. (M)Cell wall/membrane biogenesis. (N) Cell motility. (O) Post-translationalmodification, protein turnover, chaperones. (P) Inorganic ion transportand metabolism. (Q) Secondary metabolite biosynthesis, transport, andcatabolism. (R) General function prediction only. (S) Function unknown.(T) Signal transduction mechanisms. (U) Intracellular trafficking and se-cretion. (V) Defense mechanisms. (Z) Cytoskeleton.

Metagenome screening to boost enzyme discovery

Genome Research 1607www.genome.org

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 6: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

encode transposases, recombinases, and integrases, assigned to

COG families 3385, 4584, 5433, 3464, 3547, 4973, and 4974 (COG

category L) (Supplemental Table S4). Moreover, in five cases, we

observed a drastic change of DNA taxonomic assignation based on

sequence homology around the gene encoding transposase, inte-

grase, or recombinase (Fig. 4). In the case of clones 2, 11, 12, and

14/15, the first part of the contigs presented a perfect syntenywith

a genomic fragment fromone gut bacterium,while the second part

showed synteny with a fragment of a different gut bacterial ge-

nome. In the case of clone 16, the synteny with the B. uniformis

ATCC 8492 genome is lost for seven genes in the middle of the

contig that are not even highly similar to any B. uniformis ATCC

8492 gene. We thus hypothesize that, as for the other clones

mentioned in Figure 4, such a gene organization results from gene

transfers between bacterial species. For these clone sequences, the

genomic heterogeneity was also confirmed by tetranucleotide

frequency analysis (Supplemental Fig. S4). This provides conclu-

sive evidence of human gut metagenome plasticity. Such a dem-

onstration was rendered possible by the in-depth sequencing of

large metagenomic DNA fragments, which provided both reliable

information about gene organization and the proof that the con-

tigable sequences originated from a single bacterial genome.

Identification and organization of CAZyme-encoding genes

The detection of genes encoding CAZymes, which are responsible

for polysaccharide degradation, was the last step of the strategy

(Fig. 1). A BLAST-based sequence comparison against the CAZy

database identified 73CAZymeproteins, encoded by 65 full-length

and eight truncated genes (SI). Several proteins were multi-

modular, resulting in a total of 86 modules assigned to 35 known

CAZy families (Supplemental Table S3), corresponding mainly

to polysaccharide degrading activities, including 20 glycoside-

hydrolase (GH), seven carbohydrate-esterase (CE), and one poly-

saccharide lyase (PL) families. In order to identify the gene that is

responsible for the detected activity in the primary screens, we

have performed a transposon mutagenesis of the fosmid inserts.

All of the proteins (labeled in Supplemental Table S3) for which an

experimental proof of function is provided, were identified as

CAZymes by using sequence-based analysis. They all contain a

catalytic module belonging to a known GH or CE family, of which

the activity described in the CAZy database is in agreement with

the activity we screened for. We did not obtain any inactivated

clones by transposon mutagenesis of clones 1, 5, 8, and 9. This

indicates that several enzymes encoded by these fosmids may be

involved in the detected activity.

Besides, many CAZymes involved in the breakdown of plant

polysaccharides display a modular structure in which the catalytic

domain carries one or several ancillary domains that can be cata-

lytic, carbohydrate-binding, or of as-yet-unknown function. Four

known families of carbohydrate-binding modules (CBM) and one

fibronectin (FN) module were also found to be associated with

catalytic modules, presumably for the attachment of enzymes

to their substrates. Moreover, five of the 73 identified CAZymes

(marked in Supplemental Table S3) harbored additional modules

with no similarity to any known CAZy family. These families of

Figure 3. CAZy gene clusters for each clone sequence from 1 to 26. Below the clone number is the activity forwhich each clone has been screened. (Blue)CAZy-encoding genes; (yellow) SusD homolog–encoding genes; (green) transport system protein–encoding genes; (purple) other genes. 14/15 showsthe CAZy gene clusters of assembled sequences from these clones. Clones 10 and 11 and clones 17 and 18 have the same CAZy gene clusters; thesesequences are not assembled together. On top of each bar is the taxonomic assignation of the clone when assignable, other clones are nonassigned. (*)Synteny with Roseburia intestinalis L1-82 (1); Bacteroides uniformis ATCC 8492 (2); Bacteroides stercoris ATCC 43183 (3); Bacteroides eggerthiiDSM 20697 (4).

Tasse et al.

1608 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 7: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

modules of unknown function potentially represent five novel

CAZy families. The precise function of these novel protein mod-

ules will be investigated by rational truncation of the correspond-

ing proteins, in order to identify the catalytic or carbohydrate-

binding function of the modules in question.

Among the 622 complete nonredundant genes, 19% were

predicted to encode a signal peptide. This number increased to

38% when considering only the CAZyme-encoding genes. This is

consistent with the role of these enzymes in vivo in the digestion

of polysaccharide substrates that are impossible to internalize by

bacterial cells. It is probable that most of the CAZymes were not

secreted by E. coli cells used here as the recombinant host. Instead,

CAZyme access to the insoluble polysaccharides of the functional

screens was most likely due to the release of cytoplasmic proteins

by E. coli cell lysis.

As demonstrated by the G COG-cluster enrichment, the pres-

ent function-based strategy was very powerful in focusing the se-

quencing only on metagenomic DNA fragments rich in CAZyme

modules. One module was found every 10 kb, with a fivefold

higher frequency than that observed from random sequencing

(Turnbaugh et al. 2009). The enrichment in catabolic genes can

also be estimated by the glycoside hydrolase/glycosyltransferase

(GH/GT) ratio. The functional screen strategy that we used led to

a GH/GT ratio of 33, much higher than the 1.5 ratio obtained in

the analysis of complete genomes from gut bacteria (Lozupone

et al. 2008) or even the 3.4 ratio within metagenomics short reads

(Turnbaugh et al. 2009). Our strategy for target-gene enrichment in

metagenomes is even more efficient that those based on DNA iso-

lation from enrichment cultures grown on polysaccharides (Grant

et al. 2004) or on labeling DNA through stable isotope probing

(Kalyuzhnaya et al. 2008).

The study of the organization of CAZyme-encoding genes

identified here is of particular interest. Among the 73 CAZyme-

encoding genes, 48 were found to constitute 18 multigenic clus-

ters, possibly representing operon-like systems including other

genes involved in carbohydrate transport and/or binding like SusD

homologs and putative proteins from the TonB-dependant re-

ceptor family (Fig. 3; Martens et al. 2009). In five cases, a striking

synteny was obtained with similar gene clusters from genomes of

gastrointestinal tract bacteria, for which the biochemical proof of

function has never been described to our knowledge. For the first

time using a screening-basedmetagenomics approach, we describe

CAZyme gene clusters involved in dietary fiber catabolism by the

human gut microbiome.

Interestingly, the distribution of CAZyme gene clusters and

the number of CAZymemodules and families were highly variable

among the clones and found to depend on their activities. Indeed,

metagenomic DNA inserts from clones able to degrade starch,

contained only one to three CAZyme modules corresponding

mainly to family GH13. In comparison, the DNA fragments in-

serted in clones able to degrade beta-glucans and xylan contained

up to 17 CAZymemodules corresponding to 13 different CAZyme

families. All the functions of these CAZyme modules (cellulases,

hemicellulases, carbohydrate-esterases, and associated carbohy-

drate-binding modules) are required in vivo for the complete

degradation of plant cell wall polysaccharides, whose structures are

muchmore complex than that of starch. These operon-like clusters

probably reflect the adaptation of the genetic potential of gut

bacteria to the degradation of highly complex polysaccharide

structures.

Finally, in order to assess how prevalent the genes we iden-

tified are among the gut microbiomes worldwide, we compared

our data to the metagenome sequences currently available, issued

from 124 European (Qin et al. 2010), 13 Japanese (Kurokawa et al.

2007), and 46 U.S. individuals (Gill et al. 2006; Turnbaugh et al.

2009, 2010). None of the genes we identified in our contigs was

found in the U.S. and Japanese individual data sets. This was

probably because we used highly stringent criteria for searching

similarities with our full-length protein sequences (E-value = 0;

identity $ 90%), in order to avoid any overestimation of the

gene prevalence. In contrast, when comparing our data to the 3.3-

million-gene catalog obtained from the European individuals, we

identified 154 highly prevalent genes, detected in 20 individuals

or more (identity $ 90%) (Supplemental Table S4). Among them,

33 encodeCAZymes. In addition, among the65 completeCAZyme-

encoding genes of the present study, 32 matched with 100%

identity to genes present in at least one individual, and six in at

least 12 individuals (protein ID ADD61840, clone 3; ADD62008,

clone 10; ADD62010, clone 10; ADD62011, clone 10; ADD61504,

clone 16; ADD61689, clone 22) (Supplemental Table S3). These six

CAZymes were found in the gut microbiomes of individuals with

very distinct body mass index (lean, overweight, obese), and with

different clinical status (healthy, inflammatory bowel–diseased

patients).Moreover, inmany cases (for clones 3, 4, 9, 10/11, 14/15,

16, 19, 20, 26), the genes surrounding these highly prevalent

CAZymes were also present in several individuals. These results

show the power of such an activity-based functional meta-

genomics approach, even when applied on a single sample, to

provide an experimental proof of function to highly prevalent

genes and gene clusters of the human gut microbiome. This also

underlines the interest of coupling sequence-based and activity-

based metagenomics to investigate the gut microbiota functions

and to measure the prevalence and abundance of targeted genes.

Figure 4. Evidence of horizontal gene transfers (HGTs) in human gutmetagenomic sequences. HGTs were identified when rupture was ob-served in gene synteny between the genes present in the metagenomicDNA fragments and their best BLASTP hits issued from sequenced genomes.For each clone, the first line represents the clone metagenomic sequence,and the second line represents the genome part in synteny with it. Eacharrow represents a gene. (Red arrows) Genes encoding putative transposasesor integrases; (black arrows) CAZy-encoding genes; (stars within black ar-rows) genes encoding the CAZymes involved in the activity detected in theprimary screens, as proven by transposon insertion in the fosmid inserts.

Metagenome screening to boost enzyme discovery

Genome Research 1609www.genome.org

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 8: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Concluding remarks

This study demonstrates that the rational design of a multi-step

functional screening procedure to guide sequencing is a very

powerful strategy to accelerate enzyme discovery inmetagenomes.

Here, it was applied to identify highly prevalent genes encoding

enzymes that are involved in the catabolism of the dietary fibers by

the human gut microbiome and provided new insights into the

gastrointestinal tract functional trophic chain. Besides, our pro-

cedure appears to efficiently identify clusters of potentially com-

plementary activities for the complete breakdown of complex

plant polysaccharides, which can be of prime interest for bio-

refinery processes and white biotechnologies. Their potential for

such applications will have to be evaluated in futureworks. Finally,

we note that the strategy reported here, which coupled functional

screens and sequence-based metagenomics, is highly generic and

can be applied to mine other ecosystems known to be highly

specialized for raw biomass degradation (i.e., rumen and insect gut

microbiomes) for novel biocatalysts.

Methods

Construction of the metagenomic library

The fecal sample was collected from a healthy 30-yr-old male who

followed a vegetarian and fish-eating diet. His ascendants were

omnivorous. The individual did not eat any functional food such

as prebiotics or probiotics, nor did he receive any antibiotics or

other drugs during the 6mobefore sampling. The bacterial fraction

was recovered from 2 g of feces by a gradient density technique

using Nycodenz as previously described (Courtois et al. 2003). The

bacterial cell fraction was collected, washed with ultra-pure water,

then centrifuged for 10 min at 12,000g. The cell pellet was resus-

pended in a 50 mM Tris (pH 8), 100 mM EDTA buffer and then

incorporated in low-melt-point agarose before a gentle enzy-

matic lysis, as described by Ginolhac et al. (2004). High-molecular-

weight bacterial DNA trapped in agarose plugs was immediately

inserted into the wells of a 0.8% low-melting-temperature gel (Bio-

Rad) and separated for 18 h by pulsed-field gel electrophoresis at

4.5 V/cmwith 5- to 40-sec pulse times with a CHEFDRIII apparatus

(Bio-Rad). DNA fragments with size ranging from 30 to 40 kb were

isolated and recovered from the gel with GELase (Epicentre Tech-

nologies). Phylogenetic analysis of the extracted metagenomic

DNA using 16S rDNA sequencing was performed according to Tap

et al. (2009). The GenBank accession numbers for the 16S rDNA

molecular inventory are HM475513–HM480042. The correspon-

dence between the bacterial clone numbers appearing in Supple-

mental Figure S2 and the corresponding GenBank accession num-

bers is mentioned in Supplemental Table S5.

The metagenomic DNA was then cloned into fosmids by us-

ing the pCC1FOS fosmid library production kit (Epicentre Tech-

nologies) as recommended by the manufacturer. Recombinant

colonies were transferred to 384-well microtiter plates containing

freezing medium (Luria-Bertani, 8% glycerol complemented with

12.5 g/mL chloramphenicol), using an automated colony picker

(QpixII; Genetix). After 22 h of growth at 37°C without any agi-

tation, the plates were stored at ÿ80°C.

High-throughput functional screens

Metagenomic clones were screened for polysaccharide digestion

activities by spotting them on 22 cm 3 22 cm bioassay trays

containing solid agar and the target polysaccharide, using a QPixII

(Genetix) colony picker. Solid agar was either PLA (agar-supple-

mented LB buffered to pH 6.6 by addition of 5.4 g/L Na2H-

PO4�12H2O and 4.8 g/L NaH2PO4�H2O) or, in the case of starch

related polysaccharide containing media, terrific broth (TB). All

media were supplemented with 12.5 mg/L chloramphenicol and

with polysaccharides (beta-glucans, xylans, pectin, amylose, gal-

actan) as listed in Supplemental Table S1. The assay plates were

incubated for 7 d at 37°C, except for plates containing AZCL-

amylose, which were incubated for only 3 d to avoid interference

with E. coli host starch-degrading activities. A final throughput of

200,000 clones assayed per week and per substrate was achieved.

After incubation on plates containing chromogenic poly-

saccharides, positive clones were visually detected by the presence

of a blue or red halo resulting from the production of colored oli-

gosaccharides that diffused around the bacterial colonies. For

pectin assays, the plates were colored for 20 min with an aqueous

solution of Ruthenium Red (0.5%m/v) at room temperature. After

removing exceeding Ruthenium Red solution by aspiration, clear

halos were observed around the positive clones.

Secondary screens

All positive clones were further screened for hydrolysis efficiency

and specificity toward various polysaccharide structures, by

screening them on solid agar containing polysaccharides of vari-

ous structures (Supplemental Table S1). Native polysaccharides

were added to the sterile agar media at 50°C to conserve their

crystalline structure. Tenmicroliters of overnight liquid cultures of

the positive clones were placed on the agar surface, and the plates

were incubated for 3 to 7 d at 37°C. Plates containing non-

chromogenic beta-glucans and xylan were stained with an aque-

ous solution of Congo Red (0.05% m/v) followed by an overnight

exposure to 1 M NaCl. Digestion zones were visible as clear halos

around the positive colonies, except the deep brown halos ob-

served for carboxymethyl cellulose. Nonchromogenic amylose-

(Potocki-Veronese et al. 2005) and starch-containing plates were

stained by exposure to iodine vapor, revealing unstained halos

around positive colonies. Nonchromogenic pectic polysaccharides

were stained with Ruthenium Red as described in the previous

section.

To measure enzyme thermostability and activities at various

pH values, positive clones were grown in liquid cultures in 96-well

microplates. Cell lysis was performed by addition of 0.5 mg/mL

lysozyme and one cycle of freeze/thaw at ÿ20°C. For thermosta-

bility assays, cell extracts were incubated for 15 min at 55°C. Cell

extracts were incubated in 20mMcitrate-phosphate buffer at pH 4,

7, and 9, supplemented with 0.1% AZCL-polysaccharides (same as

used in primary screens), for 24 h at 37°C. Polysaccharide hydro-

lysis resulting in the release of soluble blue oligosaccharides was

quantified by measuring absorbance at 590 nm.

Transposon mutagenesis of the DNA inserts from the 26 se-

lected clones was performed using the EZ-Tn5 <oriV/KAN-2> In-

sertion Kit (Epicentre). Inactivated clones were identified by plat-

ing isolated colonies on agar-supplemented LB containing 12.5

mg/L chloramphenicol, 50 mg/L kanamycine, and the polysac-

charide used in the primary screens. Sanger sequencing was per-

formed outward from the nested transposon using the primers

supplied in the kit.

Pyrosequencing, read assembly, and gene prediction

Pyrosequencing of whole fosmid inserts was performed on a 454

Life Sciences (Roche) GS FLX system by the Genoscope sequencing

facility (Evry, France), yielding in total 186,762 contigable reads.

Read assembly was done using CAP3 (Huang and Madan 1999),

a DNA Sequence Assembly Program, and resulted in 106 contigs

sizing between 113 bp and 51,798 bp, covering in total 1,002,117

Tasse et al.

1610 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 9: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

bp. Ninety-eight percent of the sequenced nucleotides were in-

cluded in 27 large contigs of at least 8343 nt, obtained with amean

sequencing depth of 443. Two large contigs were found for clone

4. These 27 large contigs were further used for analysis. pCC1FOS

sequences were identified using Crossmatch (http://bozeman.

mbt.washington.edu/phredphrapconsed.html), discarded, and re-

placed by NNN. Excluding the vector sequences, these 27 large

contigs included 881,473 nt of metagenomic DNA. The compari-

son of these sequences with themselves revealed three cases of

partial sequence redundancy, which always occurred between

clones presenting the same enzymatic activity detected using the

primary screens. In the first two cases, the 59 extremity of a contig

was identical to the 39 extremity of a contig from another clone

(clones 14/15 and 17/18), which allowed manual assembly of

them to provide up to 71.3 kb of metagenomic DNA issued from

one unique gut bacterium. In the case of beta-glucanase active

clones, one sequence fragment (20.9 kb) from clone 10 was also

found in the contig sequence from clone 11, without any ho-

mologies of the contig extremities. As described in this report, this

particular sequence redundancy phenomenon may be due to

HGTs. The Metagene program (http://metagene.cb.k.u-tokyo.ac.

jp/metagene) was used to predict open reading frames (ORFs$ 20

amino acids) from the resulting sequences. No frameshift was

detected in the gene sequences by using BLASTX comparison to

the Uniref100 database, reflecting the reliability of read assembly

and gene detection. For each of the 26 clones, the large contig

sequence has been deposited in DDBJ/EMBL/GenBank under ac-

cession numbers GU942928–GU942942 and GU942944–GU942954.

ORF analysis

COG assignment of predicted gene products was made using RPS-

BLAST analysis against the reference GOG data set. COG assign-

ment was taken into account only for E-values # 10ÿ8. When

a predicted gene product was assigned to multiple COGs, this hit

was counted as divided by the number of assigned COGs, and

the value was dispensed evenly to each COG. Signal peptide pre-

diction was performed using PHOBIUS (http://www.ebi.ac.uk/

Tools/phobius/). CAZyme-encoding genes were identified by

BLAST analysis of the nucleotide sequences from the 106 contigs

against the amino acid sequences derived from the CAZy database

(http://www.cazy.org) using a cut-off E-value of 7 3 10ÿ6. Other

genes were manually annotated using NCBI-BLASTP against the

NR database (E-value < 10ÿ8, identity > 35%, query length cover-

age $ 50%). Gene prevalence in the human gut microbiome was

detected by using a TBLASTN comparison of the protein sequences

identified in this study to the metagenomic data sets available for

124 European (Qin et al. 2010), 13 Japanese (Kurokawa et al. 2007),

and 46 U.S. individuals (Gill et al. 2006; Turnbaugh et al. 2009,

2010) (E-value = 0, identity $ 90% or identity = 100%).

Taxonomic assignment of metagenomic sequences

Two methodologies were used. The first was based on protein se-

quence similarities with proteins of sequenced genomes, using a

BLASTP analysis against the nonredundant protein sequence da-

tabase of the NCBI. For each protein of each metagenomic DNA

fragment, the microbial origin of the best BLAST hit was assigned

only for matches covering at least 50% of the protein length, with

an E-value better than 10ÿ8 and an identity of at least 90%. Pro-

teins that did not pass those criteria were assigned to the ‘‘no hits’’

category. We assigned a class, genus, or species to the DNA frag-

ment issued from one clone when at least 50% of the putative

proteins encoded by this fragment presented a best BLAST hit

issued from the same microbe. Also, if putative proteins encoded

from the same DNA fragment had the best BLAST hit issued from

microbes of different classes, we considered the entire fragment as

unassignable. The second approach was based on tetranucleotide

frequency count, an analysis related to genomic signatures, by

using Ocount software (Teeling et al. 2004) connected to a pre-

viously designed pipeline allowing a normalization of tetranu-

cleotide frequency according to sequence length (Tap et al. 2009).

The 26-fosmid insert sequences were analyzed as divided into

10-kb fragments. Genetic diversity, recorded as 256-tetranucleotide

distribution, was represented by a principle component analysis

(PCA) using R software (Chessel et al. 2004). Only the first two PCA

components, representing 49.7% of the total genetic diversity,

were used to illustrate this analysis.

Acknowledgments

The high-throughput screening work was performed at the Labo-

ratory for BioSystems & Process Engineering (Toulouse, France)

with the ICEO automated facility. ICEO is supported by grants

from the Region Midi-Pyrenees, France, the European Regional

Development Fund, and the Institut National de la Recherche

Agronomique, France (the French National Institute for Agricul-

tural Research). We thank Sophie Bozonnet and Sandrine Laguerre

for their assistance. This work was carried out with the financial

support of the ANR—Agence Nationale de la Recherche—The

French National Research Agency under the Programme National

de Recherche en Alimentation et nutrition humaine, project ANR-

06-PNRA-024. Pyrosequencing was funded by the French National

Institute for Agricultural Research.

References

Brennan Y, Callen WN, Christoffersen L, Dupree P, Goubet F, Healey S,Hernandez M, Keller M, Li K, Palackal N, et al. 2004. Unusual microbialxylanases from insect guts. Appl Environ Microbiol 70: 3609–3617.

Chen S, Bagdasarian M, Kaufman MG, Bates AK, Walker ED. 2007.Mutational analysis of the ompA promoter from Flavobacteriumjohnsoniae. J Bacteriol 189: 5108–5118.

Chessel D, Dufour AB, Thioulouse J. 2004. The ade4 package—I: One-tablemethods. R News 4: 5–10.

Courtois S, Cappellano CM, Ball M, Francou FX, Normand P, Helynck G,Martinez A, Kolvek SJ, Hopke J, Osburne MS, et al. 2003. Recombinantenvironmental libraries provide access to microbial diversity for drugdiscovery from natural products. Appl Environ Microbiol 69: 49–55.

Duan CJ, Xian L, Zhao GC, Feng Y, Pang H, Bai XL, Tang JL, Ma QS, Feng JX.2009. Isolation and partial characterization of novel genes encodingacidic cellulases from metagenomes of buffalo rumens. J Appl Microbiol107: 245–256.

Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, GillSR, Nelson KE, Relman DA. 2005. Diversity of the human intestinalmicrobial flora. Science 308: 1635–1638.

Feng Y, Duan CJ, Pang H, Mo XC, Wu CF, Yu Y, Hu YL, Wei J, Tang JL, FengJX. 2007. Cloning and identification of novel cellulase genes fromuncultured microorganisms in rabbit cecum and characterization of theexpressed cellulases. Appl Microbiol Biotechnol 75: 319–328.

Ferrer M, Golyshina OV, Chernikova TN, Khachane AN, Reyes-Duarte D,Santos VA, Strompl C, Elborough K, Jarvis G, Neef A, et al. 2005. Novelhydrolase diversity retrieved from a metagenome library of bovinerumen microflora. Environ Microbiol 7: 1996–2010.

Ferrer M, Beloqui A, Timmis KN, Golyshin PN. 2009. Metagenomics formining new genetic resources of microbial communities. J Mol MicrobiolBiotechnol 16: 109–123.

Flint HJ, Bayer EA, Rincon MT, Lamed R, White BA. 2008. Polysaccharideutilization by gut bacteria: Potential for new insights from genomicanalysis. Nat Rev Microbiol 6: 121–131.

Gabor EM, Alkema WB, Janssen DB. 2004. Quantifying the accessibility ofthe metagenome by random expression cloning techniques. EnvironMicrobiol 6: 879–886.

Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI,Relman DA, Fraser-Liggett CM, Nelson KE. 2006. Metagenomic analysisof the human distal gut microbiome. Science 312: 1355–1359.

Metagenome screening to boost enzyme discovery

Genome Research 1611www.genome.org

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 10: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Ginolhac A, Jarrin C, Gillet B, Robe P, Pujic P, Tuphile K, Bertrand H, VogelTM, Perriere G, Simonet P, et al. 2004. Phylogenetic analysis ofpolyketide synthase I domains from soil metagenomic libraries allowsselection of promising clones. Appl Environ Microbiol 70: 5522–5527.

Grabitske HA, Slavin JL. 2008. Low-digestible carbohydrates in practice.J Am Diet Assoc 108: 1677–1681.

Grant S, Sorokin DY, Grant WD, Jones BE, Heaphy S. 2004. A phylogeneticanalysis of Wadi el Natrun soda lake cellulase enrichment cultures andidentification of cellulase genes from these cultures. Extremophiles 8:421–429.

Guo H, Feng Y,MoX, Duan C, Tang J, Feng J. 2008. [Cloning and expressionof a beta-glucosidase gene umcel3G frommetagenome of buffalo rumenand characterization of the translated product]. Sheng Wu Gong ChengXue Bao 24: 232–238.

Huang X, Madan A. 1999. CAP3: A DNA sequence assembly program.Genome Res 9: 868–877.

Institute of Medicine. 2005. Dietary reference intakes. National Academy ofSciences, Washington, DC.

Jones BV,Marchesi JR. 2007. Transposon-aided capture (TRACA) of plasmidsresident in the human gut mobile metagenome. Nat Methods 4: 55–61.

Kalyuzhnaya MG, Lapidus A, Ivanova N, Copeland AC, McHardy AC, SzetoE, Salamov A, Grigoriev IV, Suciu D, Levine SR, et al. 2008. High-resolution metagenomics targets specific functional types in complexmicrobial communities. Nat Biotechnol 26: 1029–1034.

Kurokawa K, Itoh T, Kuwahara T, Oshima K, Toh H, Toyoda A, Takami H,Morita H, Sharma VK, Srivastava TP, et al. 2007. Comparativemetagenomics revealed commonly enriched gene sets in human gutmicrobiomes. DNA Res 14: 169–181.

Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS,Schlegel ML, Tucker TA, Schrenzel MD, Knight R, et al. 2008. Evolutionof mammals and their gut microbes. Science 320: 1647–1651.

Li LL, McCorkle SR, Monchy S, Taghavi S, van der Lelie D. 2009.Bioprospecting metagenomes: Glycosyl hydrolases for convertingbiomass. Biotechnol Biofuels 2: 10. doi: 10.1186/1754-6834-2-10.

Liu JR, Duan CH, Zhao X, Tzen JT, Cheng KJ, Pai CK. 2008. Cloning ofa rumen fungal xylanase gene and purification of the recombinantenzyme via artificial oil bodies. Appl Microbiol Biotechnol 79: 225–233.

Lozupone CA, Hamady M, Cantarel BL, Coutinho PM, Henrissat B, GordonJI, Knight R. 2008. The convergence of carbohydrate active generepertoires in human gut microbes. Proc Natl Acad Sci 105: 15076–15081.

Macdonald TT, Monteleone G. 2005. Immunity, inflammation, and allergyin the gut. Science 307: 1920–1925.

MahowaldMA, Rey FE, Seedorf H, Turnbaugh PJ, Fulton RS,WollamA, ShahN, Wang C, Magrini V, Wilson RK, et al. 2009. Characterizing a modelhuman gut microbiota composed of members of its two dominantbacterial phyla. Proc Natl Acad Sci 106: 5859–5864.

Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L,Nalin R, Jarrin C, Chardon P, Marteau P, et al. 2006. Reduced diversity offaecal microbiota in Crohn’s disease revealed by a metagenomicapproach. Gut 55: 205–211.

Martens EC, Koropatkin NM, Smith TJ, Gordon JI. 2009. Complex glycancatabolism by the human gut microbiota: The Bacteroidetes Sus-likeparadigm. J Biol Chem 284: 24673–24677.

McGarr SE, Ridlon JM, Hylemon PB. 2005. Diet, anaerobic bacterialmetabolism, and colon cancer: A review of the literature. J ClinGastroenterol 39: 98–109.

Pang H, Zhang P, Duan CJ, Mo XC, Tang JL, Feng JX. 2009. Identification ofcellulase genes from the metagenomes of compost soils and functionalcharacterization of one novel endoglucanase. Curr Microbiol 58: 404–408.

Potocki-Veronese G, Putaux JL, Dupeyre D, Albenne C, Remaud-Simeon M,MonsanP, BuleonA. 2005. Amylose synthesized in vitroby amylosucrase:Morphology, structure, and properties. Biomacromolecules 6: 1000–1011.

Qin J, Li R, Raes J, ArumugamM, Burgdorf KS,ManichanhC,Nielsen T, PonsN, Levenez F, Yamada T, et al. 2010. A human gut microbial genecatalogue established by metagenomic sequencing. Nature 464: 59–65.

Qu A, Brulc JM,WilsonMK, Law BF, Theoret JR, Joens LA, Konkel ME, AnglyF, Dinsdale EA, Edwards RA, et al. 2008. Comparative metagenomicsreveals host specific metavirulomes and horizontal gene transferelements in the chicken cecum microbiome. PLoS ONE 3: e2945. doi:10.1371/journal.pone.0002945.

Rakoff-Nahoum S, Paglino J, Eslami-Varzaneh F, Edberg S, Medzhitov R.2004. Recognition of commensal microflora by toll-like receptors isrequired for intestinal homeostasis. Cell 118: 229–241.

Rees HC, Grant S, Jones B, Grant WD, Heaphy S. 2003. Detecting cellulaseand esterase enzyme activities encoded by novel genes present inenvironmental DNA libraries. Extremophiles 7: 415–421.

Richardson TH, Tan X, Frey G, Callen W, Cabell M, Lam D, Macomber J,Short JM, Robertson DE, Miller C. 2002. A novel, high performanceenzyme for starch liquefaction. Discovery and optimization of a low pH,thermostable alpha-amylase. J Biol Chem 277: 26501–26507.

Roberts AP, Chandler M, Courvalin P, Guedon G, Mullany P, Pembroke T,Rood JI, Smith CJ, Summers AO, Tsuda M, et al. 2008. Revisednomenclature for transposable genetic elements. Plasmid 60: 167–173.

Rondon MR, August PR, Bettermann AD, Brady SF, Grossman TH, Liles MR,Loiacono KA, Lynch BA, MacNeil IA, Minor C, et al. 2000. Cloning thesoil metagenome: A strategy for accessing the genetic and functionaldiversity of uncultured microorganisms. Appl Environ Microbiol 66:2541–2547.

Selvendran RR. 1984. The plant cell wall as a source of dietary fiber:Chemistry and structure. Am J Clin Nutr 39: 320–337.

Simon C, Daniel R. 2009. Achievements and new knowledge unraveled bymetagenomic approaches. Appl Microbiol Biotechnol 85: 265–276.

Sonnenburg JL, Xu J, LeipDD, ChenCH,Westover BP,Weatherford J, BuhlerJD, Gordon JI. 2005. Glycan foraging in vivo by an intestine-adaptedbacterial symbiont. Science 307: 1955–1959.

Tamames J, Moya A. 2008. Estimating the extent of horizontal gene transferin metagenomic sequences. BMC Genomics 9: 136. doi: 10.1186/1471-2164-9-136.

Tang K, Utairungsee T, Kanokratana P, Sriprang R, Champreda V,Eurwilaichitr L, Tanapongpipat S. 2006. Characterization of a novelcyclomaltodextrinase expressed from environmental DNA isolated fromBor Khleung hot spring in Thailand. FEMS Microbiol Lett 260: 91–99.

Tang K, Kobayashi RS, Champreda V, Eurwilaichitr L, Tanapongpipat S.2008. Isolation and characterization of a novel thermostableneopullulanase-like enzyme from a hot spring in Thailand. BiosciBiotechnol Biochem 72: 1448–1456.

Tap J, Mondot S, Levenez F, Pelletier E, Caron C, Furet JP, Ugarte E, Munoz-Tamayo R, Paslier DL, Nalin R, et al. 2009. Towards the human intestinalmicrobiota phylogenetic core. Environ Microbiol 11: 2574–2584.

Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner FO. 2004. TETRA:A web-service and a stand-alone program for the analysis andcomparison of tetranucleotide usage patterns in DNA sequences. BMCBioinformatics 5: 163. doi: 10.1186/1471-2105-5-163.

Temperton B, Field D, Oliver A, Tiwari B, Muhling M, Joint I, Gilbert JA.2009. Bias in assessments of marine microbial biodiversity in fosmidlibraries as evaluated by pyrosequencing. ISME J 3: 792–796.

Turnbaugh PJ, Gordon JI. 2009. The core gut microbiome, energy balanceand obesity. J Physiol 587: 4153–4158.

Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE,Sogin ML, Jones WJ, Roe BA, Affourtit JP, et al. 2009. A core gutmicrobiome in obese and lean twins. Nature 457: 480–484.

Turnbaugh PJ, Quince C, Faith JJ, McHardy AC, Yatsunenko T, Niazi F,Affourtit J, Egholm M, Henrissat B, Knight R, et al. 2010. Organismal,genetic, and transcriptional variation in the deeply sequenced gutmicrobiomes of identical twins. Proc Natl Acad Sci 107: 7503–7508.

Uchiyama T, Miyazaki K. 2009. Functional metagenomics for enzymediscovery: Challenges to efficient screening. Curr Opin Biotechnol 20:616–622.

Voget S, Leggewie C, Uesbeck A, Raasch C, Jaeger KE, Streit WR. 2003.Prospecting for novel biocatalysts in a soil metagenome. Appl EnvironMicrobiol 69: 6235–6242.

World Health Organization. 2003.Diet, nutrition and the prevention of chronicdisease. Technical Report Series no. 916. http://whqlibdoc.who.int/trs/who_TRS_916.pdf.

Zoetendal EG, Akkermans AD, De Vos WM. 1998. Temperature gradient gelelectrophoresis analysis of 16S rRNA from human fecal samples revealsstable and host-specific communities of active bacteria. Appl EnvironMicrobiol 64: 3854–3859.

Received March 25, 2010; accepted in revised form July 29, 2010.

Tasse et al.

1612 Genome Researchwww.genome.org

Cold Spring Harbor Laboratory Press on November 29, 2010 - Published by genome.cshlp.orgDownloaded from

Page 11: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Functional characterization of a gene locus from anuncultured gut Bacteroides conferringxylo-oligosaccharides utilization to Escherichia coli

Alexandra S. Tauzin,1,2 Elisabeth Laville,1

Yao Xiao,3 S�ebastien Nouaille,1

Pascal Le Bourgeois,1 St�ephanie Heux,1

Jean-Charles Portais,1 Pierre Monsan,2

Eric C. Martens,3 Gabrielle Potocki-Veronese1 and

Florence Bordes1*1LISBP, CNRS, INRA, INSAT, Universit�e de Toulouse,

Toulouse, France.2TWB, INRA, Ramonville Saint-Agne, France.3Department of Microbiology and Immunology,

University of Michigan Medical School, Ann Arbor, MI,

USA.

Summary

In prominent gut Bacteroides strains, sophisticated

strategies have been evolved to achieve the complete

degradation of dietary polysaccharides such as xylan,

which is one of the major components of the plant cell

wall. Polysaccharide Utilization Loci (PULs) consist of

gene clusters encoding different proteins with a vast

arsenal of functions, including carbohydrate binding,

transport and hydrolysis. Transport is often attributed

to TonB-dependent transporters, although major facili-

tator superfamily (MFS) transporters have also been

identified in some PULs. However, until now, few of

these transporters have been biochemically character-

ized. Here, we targeted a PUL-like system from an

uncultivated Bacteroides species that is highly

prevalent in the human gut metagenome. It encodes

three glycoside-hydrolases specific for xylo-

oligosaccharides, a SusC/SusD tandem homolog and

a MFS transporter. We combined PUL rational engi-

neering, metabolic and transcriptional analysis in

Escherichia coli to functionally characterize this

genomic locus. We demonstrated that the SusC and

the MFS transporters are specific for internalization of

linear xylo-oligosaccharides of polymerization degree

up to 3 and 4 respectively. These results were

strengthened by the study of growth dynamics and

transcriptional analyses in response to XOS induction

of the PUL in the native strain, Bacteroides vulgatus.

Introduction

Due to scarcity of genes coding for complex

polysaccharide-degrading enzymes (the so-called Car-

bohydrate Active enZymes, or CAZymes), humans

depend on the symbiotic microorganisms within their

digestive tract to breakdown dietary glycans that are

recalcitrant to digestion in the upper parts of the gut.

These glycans are mainly plant cell wall components,

consisting of a cellulose scaffold cross-linked with hemi-

celluloses and pectins. The structural complexity and

diversity make the complete degradation of these gly-

cans a complex issue.

To face this complexity, bacteria from various genera

have developed sophisticated systems involving bat-

teries of CAZymes and carbohydrate transporters,

encoded by genes co-localized on specific loci. In Bac-

teroides strains, which are the most prominent glycan

degraders in the intestine, Polysaccharide Utilization

Loci (PULs) encode all the proteins involved in sensing,

binding, transport and hydrolysis, that are required to

achieve the complete breakdown and uptake of glycans

(Hehemann et al., 2010; Larsbrink et al., 2014;

Rogowski et al., 2015; Cuskin et al., 2015). In the

archetypal PUL system specific to starch utilization

(SUS) from Bacteroides thetaiotaomicron, a TonB-

dependent transporter (SusC) works in synergy with

binding proteins (SusD, SusE and SusF) to internalize

the oligosaccharides derived from the hydrolysis of

starch by the cell surface a-amylase (SusG). The TonB-

dependent transporter in complex with ExbB and ExbD

proteins allows the transport of macromolecules across

the outer membrane of Gram negative bacteria via

energy derived from the proton motive force (for review

see ref. Ferguson and Deisenhofer, 2002; Schauer

et al., 2008).Accepted 8 August, 2016. *For correspondence. E-mail bordes@

insa-toulouse.fr; Tel. 133 5 61 55 94 39; Fax 133 5 61 55 94 00.

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd.This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use,distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

Molecular Microbiology (2016) 00(00), 00–00 j doi:10.1111/mmi.13480First published online 2016

Page 12: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Xylan is a major component of plant cell walls and is

highly abundant in cereal-derived human foods. In the

human gut, most of the xylanolytic bacteria were identi-

fied among the Bacteroides genus (Dodd et al., 2011;

Martens et al., 2011). To date, only two xylan PULs from

Bacteroides ovatus (PUL-XylS and PUL-XylL) were thor-

oughly studied but their characterization focused exclu-

sively on glycoside hydrolases and carbohydrate binding

proteins (Rogowski et al., 2015). Interestingly, the PUL-

XylL exhibits two SusC-like transporters while the

PUL-XylS possesses a SusC and a major facilitator

superfamily (MFS) transporter. MFS, which is located in

the inner membrane in Bacteroides species, is a second-

ary transporter of small molecules, including carbohy-

drates, in response to electrochemical potentials (for

review see ref. Yan, 2015). Few PULs have been identi-

fied harbouring both a SusC/D transport system and a

MFS transporter, such as the glycosaminoglycan and the

N-glycan PULs of Bacteroides thetaiotaomicron or the

sialic acid cluster of Bacteroides fragilis (Martens et al.,

2008; Stafford et al., 2012; Phansopa et al., 2014).

Nevertheless, the specificity of each of these proteins in

carbohydrate harvesting has not been deeply studied.

The characterization of transporters in native strains

indeed faces several bottlenecks (i) the deletion of tar-

geted gene might not be sufficient to confirm its function-

ality due to functional redundancy insured by other native

proteins; (ii) in a PUL-like system, the activation of the

system requires sensing of a specific glycan in periplasm,

which is usually different from the internalized oligosac-

charides obtained by extracellular hydrolysis, and (iii)

despite the huge efforts dedicated to bacterial genetics,

genome engineering remains a challenge for numerous

species, and is even impossible for uncultivated ones.

During the last decade, significant efforts have been

put into functional genomics and metagenomics

(Turnbaugh et al., 2007; Hess et al., 2011; Nielsen et al.,

2014) in order to elucidate the main functionalities of

microbiomes. Functional metagenomic is a powerful tool

to decipher the diversity of functions present within the

uncultured gut bacteria fraction, which represents up to

70% of the human gut microbiota. From activity-based

screening approaches emerge large metagenomic DNA

fragments (25–40 kb) containing full multigenic clusters

such as PULs that contain putative transporters (Tasse

et al., 2010). However, the diversity of carbohydrate

transporter specificities remains largely under-explored.

In this context, we decided to extend the characteriza-

tion of carbohydrate transporters to those harboured by

uncultured gut bacteria. We thus studied the recombinant

expression and functional capabilities of a PUL issued

from a highly prevalent uncultured Bacteroides strain,

involved in the metabolism of xylo-oligosaccharides

(XOS). This assembly of genes was identified from a fecal

metagenomic library screened for prebiotic degradation

(Cecchini et al., 2013). Here, by combining a transcrip-

tomic analysis of each gene of the metagenomic insert

in Escherichia coli with the biochemical characterization

of glycan hydrolysis and transport specificities, we

showed that this highly conserved PUL-like system pos-

sesses a complete functional arsenal for XOS metabo-

lism in the E. coli recombinant host. It is composed of

two transporters, one of them working in synergy with a

carbohydrate binding protein and a battery of PUL-

associated glycoside hydrolases (GH) allowing XOS

hydrolysis into xylose, which is further metabolized by

the cells. These results were strengthened by the study

of growth dynamics and transcriptional analyses in

response to XOS induction of the PUL in the native

strain, Bacteroides vulgatus.

Results and discussion

Sequence analysis reveals a PUL involved in XOS

utilization

Previously, Cecchini et al. (2013) identified the metage-

nomic clone F5, which was able to hydrolyze XOS up to

a degree of polymerization of 6 (DP6) (Cecchini et al.,

2013). The metagenomic DNA insert, sizing 39093 bp,

was assigned to Bacteroides vulgatus strain. Over 93%

of the F5 sequence showed 99% sequence identity with

a part of the B. vulgatus ATCC 8482 genome (Fig. 1A).

Its functional annotation revealed a PUL containing

genes encoding a truncated glycoside hydrolase of fam-

ily 43 (GH43_t), a hybrid two-component system

(HTCS), a TonB-dependent porin (SusC), a binding pro-

tein (SusD), two members of the glycoside hydrolase

family 43 (GH43A and GH43B), a member of the glyco-

side hydrolase family 10 (GH10), a MFS transporter and

a member of the glycoside hydrolase family 16 (GH16)

(Cecchini et al., 2013).

By comparison with the genome of B. vulgatus ATCC

8482, this metagenomic locus was interrupted within the

gh43 gene upstream of the HTCS. This interruption

might imply that at least the gh43 gene and likely the

other PUL genes upstream have been truncated during

the library construction process.

Within fully and partially sequenced genomes of gut

bacteria (Joint Genome Institute, Markowitz et al.,

2012), such PUL organization is closely conserved

throughout B. vulgatus strains and their phylogenetic

related neighbors such as B. dorei, B. sartorii and B.

massiliensis (Fig. 1A). While the PULs from B. sartorii

and B. massiliensis are not listed in the Polysaccharide-

Utilization Loci Database (PULDB), the PULs from B.

vulgatus and B. dorei are listed as predicted with some

length differences (Terrapon et al., 2015). The closest

2 A. S. Tauzin et al. j

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 13: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

PUL characterized so far, in terms of functionality, is the

small xylan PUL (PUL-XylS) from B. ovatus ATCC 8483

(Rogowski et al., 2015). This PUL encodes an HTCS, a

tandem SusC/D, a surface glycan binding protein

(SGBP), two GH10s, a MFS transporter, a GH43 and a

GH67 (Fig. 1A). It is induced by wheat arabinoxylan,

glucuronoxylan and linear XOS (Martens et al., 2008;

Rogowski et al., 2015). Immediately downstream of the

SusD-like, PULs usually encode a SGBP contributing to

the additional binding of the substrates (Cameron et al.,

2012; Rogowski et al., 2015) which is absent in the F5

PUL.

In addition to the arranged SusC- and D proteins to

potentially bind and transport glycans, the clone F5

exhibited a gene encoding a MFS transporter. In E. coli,

the sialic acid uptake is due to a specific MFS trans-

porter (NanT) (Vimrt and Troy, 1985) while in other bac-

teria the sialic-acid-targeting PULs display MFS

transporters that are sometimes associated with the

SusC/D transport system (NanO/U) such as in Tanner-

ella forsythia and B. fragilis (Roy et al., 2010; Stafford

et al., 2012; Phansopa et al., 2014). As introduced

above, such an association has also been observed in

other Bacteroides PULs and was demonstrated as being

part of the operon. Examples include the glycosamino-

glycan and the N-glycan PULs of B. thetaiotaomicron

and, more recently, the PUL-XylS of B. ovatus (Martens

et al., 2008; Rogowski et al., 2015).

Fig. 1. Representation of thePUL-like system.

A. Organization of the XOS

utilization locus based on

selected annotatedBacteroides genomes. Genes

encoding known and predicted

functionalities are colour-

coded: glycoside hydrolase(GH) with family number in

blue; hybrid two component

system (HTCS) in red; SusC

in orange; SusD in yellow;surface glycan binding protein

(SGBP or SusE-positioned) in

light yellow; transporter of the

major facilitator superfamily(MFS) in purple; transposase

(Tnp) in green and unknown

in grey. Synteny

(corresponding to 99% identityat the DNA level) between the

sequence of clone F5 and the

genome locus of B. vulgatus

ATCC 8482 are shown bygrey bars. Black arrows

represent putative

transcription units in the

Bacteroides natural host,according to the consensus

promoter sequence of

Bacteroides strain.

B. The reduced constructs ofF5 used in the present work.

Carbohydrate transporters of gut bacteria 3

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 14: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Finally, five transposase sequences are present in the

F5 sequence (Fig. 1A). Three are located between the

htcs- and the susC-like genes, one between the gh10

and the mfs transporter genes and one between the

gh16 and the mfs transporter genes.

Metagenomic gene expression in E. coli

We (Tasse et al., 2010) and others previously showed

(Ferrer et al., 2005; Wang et al., 2012; Strachan et al.,

2014) that phenotype of fosmid/cosmid metagenomic

clones is often related to the presence of several genes

encoding enzymes with various, and often complementary

activities. This is particularly true for clones harbouring

PUL-like multigenic systems issued from Bacteroidetes.

These clones encode synergistic CAZymes that are able

to completely breakdown complex polysaccharidic struc-

tures (Tasse et al., 2010). However, functional expression

of such metagenomic genes in E. coli, which is still the

predominantly used host for activity-based metagenomic,

has never been experimentally investigated. Here, for the

first time, the abilities of E. coli system to host and express

a heterologous multigenic system that is involved in XOS

metabolism from uncultured Bacteroides have been

explored at the transcriptional level.

To further investigate the level of induction/expression

of the 27 genes present on the F5 metagenomic insert,

the transcriptional level of each open reading frame

(ORF) in E. coli has been measured by quantitative RT-

PCR (Fig. 2). In the LB medium, among the 27 genes

that are present on the metagenomic DNA insert, only

10 were not expressed or expressed at very low level

(including the truncated gh43, htcs and gh16 encoding

genes from the PUL cluster). The 17 others genes were

either expressed at significant levels comparable to

endogenous E. coli housekeeping gene (ihfB) or even at

level close to the highly expressed fosmidic cam gene

(chloramphenicol acetyltransferase) used for chloram-

phenicol selection. The strongest expression was

detected for genes encoding SusD, SusC and a drug

efflux protein, at a level over three-fold relative to the

expression of ihfB. The genes coding for GH43A,

GH43B, GH10 and MFS were expressed to a level of

1.7 to 3.5-folds lower than ihfB. In addition, more than

half of the genes contained on the metagenomic insert

were transcribed at various expression levels, demon-

strating that the quantified gene expressions were due

to the recognition of distinct promoter sequences by the

recombinant host. A bioinformatics analysis of the full

F5 sequence revealed 6 putative Bacteroides promoters,

of which 3 are within the PUL. In the native Bacteroides

strain, the F5 PUL could be expressed as an operon

from mfs or gh43A to susD and regulated by the htcs,

which is expressed separately (Fig. 1A). Nevertheless,

the Bacteroides promoters cannot be responsible for

heterologous expression in E. coli, since Mastropaolo

et al. (2009) showed that Bacteroides promoters are not

recognized by E. coli (Mastropaolo et al., 2009). Using

Fig. 2. Gene expression analysis of the clone F5 on LB (black), xylose (grey) or XOS (white) growth conditions. The mean of the biologicaltriplicates is represented with 1/2 the standard error of the mean. Hybrid two-component system (HTCS), major facilitator superfamily

transporter (MFS), integration host factor b-subunit (ihfB). Dashlined represents an arbitrary threshold of expression.

4 A. S. Tauzin et al. j

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 15: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

BPROM program which predicts with an accuracy of

80% consensus r70 promoter sequences for E. coli,

about 184 putative promoters were identified within the

metagenomic insert and 34 only within the PUL (Sup-

porting Information Fig. S1). Recently, Lam and Charles

(2015) suggested that metagenomic genes, especially

those issued from Bacteroides species, are spuriously

transcribed in E. coli thanks to the random presence of

E. coli rpoD/r70 promoter sequences on metagenomic

DNA inserts, that would also be responsible for cloning

bias in metagenomic libraries (Lam and Charles, 2015).

They counted only around 10 promoter sequences/Mb

of metagenomic DNA, but they focused exclusively on

the most common consensus sequence TTGACA(-35)

and TATAAT(-10). This specific promoter sequence was

not detected within the F5 sequence. However, it is

noteworthy that r70 promoters vary in their sequence,

the absence/presence of the -35 box and the length of

the spacer between the -10 and -35 sequences

(Shultzaberger et al., 2007; Singh et al., 2011). The

BPROM tool, which allows the identification of degener-

ated promoter sequences, thus seems more pertinent to

identify putative r70 promoters, since the predictions are

in good adequacy with the present transcriptomic

results. In functional metagenomic, the recurrent bottle-

neck is to access the full potential of metagenomes and

different strategies have been developed to overcome

this limitation, mostly by improving cloning strategies

and the screening host, i.e. E.coli (for perspective see

ref. Lam et al., 2015). However, our data point out that a

significant proportion of metagenomic genes can be effi-

ciently transcribed in E. coli, and that spurious transcrip-

tion would be more advantageous than deleterious for

heterologous expression of multiple genes.

Finally, comparison of the expression data obtained

during growth on LB medium to results obtained using

xylose- and XOS-grown cultures showed that the genes

belonging to the PUL were not differentially expressed

between all the conditions tested (Fig. 2), suggesting a

lack of regulation in E. coli. It has been shown in Bacter-

oidetes that the ability to modulate the PUL expression

depends on sensor-regulator systems such as the

hybrid two-component systems in response to their tar-

geted glycan (Bolam and Koropatkin, 2012). In our sys-

tem and in all culture media, htcs gene was expressed

at a very low level in E. coli. Thus we postulate that the

PUL expression is not regulated either by the presence

of XOS nor xylose in E. coli.

Functional potential of the hydrolases in E. coli

Previously, we demonstrated that the cell extracts of the

clone F5 were able to hydrolyze XOS up to DP 6

(Supporting Information Fig. S1 from ref. Cecchini et al.,

2013). To characterize the catabolic potential of the

CAZymes contained in the multiproteic system to break-

down hemicellulose, a more detailed screening of hydro-

lytic activities was carried out on a panel of

polysaccharides, oligosaccharides and chromogenic

substrates representing the different components of

plant cell wall. Xylan and arabinoxylan were both hydro-

lyzed by the F5 cytoplasmic extracts with a preference

for unbranched xylan which was cleaved more efficiently

(Table 1). The cytoplasmic extracts have also been

tested on synthetic substrates and showed activity on

pNP-b-D-xylopyranose and pNP-a-L-arabinofuranose

(Table 1). These detected xylanase, xylosidase and ara-

binofuranosidase activities are consistent with the

known activities of CAZy families GH10 and GH43 iden-

tified in the PUL. Conversely, no activity was detected

on b-glucan or xyloglucan, likely attributed to the GH16

enzyme, a result consistent with the absence of tran-

scription of the gh16 gene (Fig. 2). Considering the pre-

vious sequence annotation of the transcription units in

Bacteroides, the gene expression analysis, and the

activity results of the clone F5, the GH16 seems unlikely

to be part of PUL.

To confirm that the enzymatic activities were due to

the GH43A, GH43B and GH10 from the PUL, we gener-

ated a reduced construct named F5min_GH containing

only the three gh genes (Fig. 1B). The enzymatic activ-

ities of the F5min_GH cell extracts were similar to those

obtained for F5, implying that the enzymes responsible

for these activities are encoded by the genes gh43A,

gh43B and gh10 (Table 1). The high number of genes to

Table 1. Activities of the cell extracts on complex polysaccharides

and synthetic substrates.

Substrate

Clone

F5 F5min_GH

Complex polysaccharides

Xylan 176.66 4.2 329.8614.2

Arabinoxylan 1 n.d.Arabinan / n.d.

Arabinogalactan / n.d.

b-glucan / n.d.

Xyloglucan / n.d.Synthetic substrates

pNP b-D-Xylopyranose 399.76 10.0 609.3652.1

pNP a-D-Xylopyranose / n.d.

pNP a-L-Arabinofuranose 239.96 10.2 353.7628.3pNP b-L-Arabinopyranose / n.d.

pNP a-L-Arabinopyranose / n.d.

Activities were expressed in mU (with 1U51 mmol/min) per litre of

culture. Mean of three biological replicates.

Abbreviations:15 residual activity after 24 h; /5 no activitydetected; n.d.5not determined.

Carbohydrate transporters of gut bacteria 5

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 16: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

be expressed in the clone F5 could explain the lower

values in the observed activity compared to the clone

F5min_GH. The enzymes are essential to hydrolyze

XOS to xylose that is further metabolized by the strain.

This result also confirmed that the GH16 is not respon-

sible for any activity detected for F5 extracts.

The theoretical subcellular localization of the GHs pro-

duced by F5 was determined using LipoP 1.0 server

(Juncker et al., 2003). For GH43B, no peptide signal or

N-terminal lipidation could be assigned, indicating a

cytoplasmic location. GH43A and GH10 exhibited a

putative signal peptidase II cleavage site and were pre-

dicted as N-terminally lipidated proteins indicating their

potential attachment to the bacterial membrane. Bacte-

rial lipoproteins are membrane proteins present both in

Gram-negative and Gram-positive bacteria. In Bacteroi-

detes some lipoproteins, including the hydrolase (SusG,

GH13) from SUS systems are known to be transported

to the outer surface of the outer membrane (Shipman

et al., 1999). In E. coli the lipoproteins are anchored

either to the inner or to the outer membrane, and ori-

ented towards the periplasmic space (Tokuda and

Matsuyama, 2004). The 12 rule postulates that the resi-

due at the N-terminal second position is critical for the

membrane specificity of lipoproteins in E. coli (for review

see Okuda and Tokuda, 2011). An Asp in position 12

maintains the lipoprotein in the inner membrane. GH43A

exhibits a Ser which is characteristic of outer membrane

sorting signal (Yamaguchi et al., 1988). GH10 pos-

sesses a Gly in position 12 which could imply an

‘ambiguous’ sorting signal (both inner and outer mem-

brane facing periplasm) as observed for the periplasmic

maltose-binding protein expressed in E. coli (Seydel

et al., 1999).

To examine the subcellular localization of the GHs in

E. coli, enzymatic assays were performed on secreted,

soluble intracellular, periplasmic and membrane protein

fractions (Supporting Information Fig. S2). No secreted

or soluble periplasmic activities were detected. Xylanase,

arabinosidase and xylosidase activities were detected in

both soluble intracellular and membrane fractions. These

results are consistent with theoretical subcellular local-

ization of the GHs that two out of the three should be

attached to the membrane and one cytoplasmic.

XOS uptake in E. coli

To evidence and characterize the transport ability of the

clone F5, its growth and that of different truncated variants

have been monitored over 24 h in liquid minimal media

(MM) supplemented with different xylose-containing gly-

cans as sole carbon sources (Figs 3 and 4A, C and E).

As control strain, we used a metagenomic clone (clone

F4) able to hydrolyze a mixture of XOS into xylose and

xylobiose (due the presence of GH8, GH43 and GH120)

but without any transporter encoding gene within its meta-

genomic insert, as reported by Cecchini et al. (2013) (see

Fig. S1 from ref. Cecchini et al., 2013). Previously, the abil-

ity of the E. coli host strain to metabolize xylose as unique

carbon source was confirmed.

When we compared the growth of F5 and F4 in MM

containing a mixture of linear XOS of DP 2 to 6, only F5

could grow, even if both clones produce hydrolases able

to hydrolyze XOS (Fig. 3). Similarly, F5min_GH that is

Fig. 3. Growth of the clonesF5 (blue), F5min_SUS (red),

F5min_MFS (orange),

F5min_SUSDSusD (purple),

F5min_GH (green), F4 (cyan)and empty pCC1fos (Epi,

grey) on a mixture of xylo-

oligosaccharides at 0.5%

(m/v). The data represent theaverage of at least biological

triplicates.

6 A. S. Tauzin et al. j

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 17: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

deleted of the genes encoding transporters did not grow

on XOS (Fig. 3). Moreover, E. coli clone F5 displayed

no ability to grow on xylan, although cell extracts were

able to hydrolyze this substrate (Tables 1 and Support-

ing Information Table S2). These results confirm, as dis-

cussed above, that the hydrolases of F5 are not

secreted in E. coli, and suggest that they are unlikely

anchored to the outer surface of the outer membrane.

But we cannot exclude the hypothesis that F5min_GH

has a GH facing out and that the hydrolysis of the XOS

in the media will be so slow that it does not support the

growth on 24 h. It also demonstrated that functional

transporters are required, in addition to functional XOS

degrading GHs, to confer to recombinant E. coli the

ability to grow on these oligosaccharides. Hence, we

conclude that the clone F5 is able to metabolize XOS

due to the XOS internalization mediated by one or sev-

eral transporters, followed by their subsequent intracel-

lular hydrolysis into xylose.

The specificity of the transporters has been character-

ized by using MM containing XOS with DP ranging from 2

to 5 as sole carbon source (Fig. 4A). While the clone F5

was able to grow in MM with DP 2 to 4, no growth has

been detected in MM containing XOS larger than DP 4.

Thus the internalization of xylose-containing glycans is

possible up to DP 4. As both xylan and arabinoxylan could

be hydrolyzed by F5 cell extracts, we tested arabino-xylo-

oligosaccharides from 2 to 4 xylosyl residues branched

Fig. 4. Growth and xylo-oligosaccharides uptake of the clones F5, F5min_SUS and F5min_MFS. Growth curves of the clone F5 (A),

F5min_SUS (C) and F5min_MFS (E) supplemented with xylopentaose ( ), xylotetraose ( ), xylotriose ( ) and

xylobiose ( ). HPAEC-PAD analysis of the culture supernatants of the clones F5 (B), F5min_SUS (D) and F5min_MFS (F) tomeasure uptake of xylotetraose ( ), xylotriose ( ) and xylobiose ( ). The squares indicate the sampling time

points. The data represent the average of at least biological duplicates.

Carbohydrate transporters of gut bacteria 7

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 18: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

with 1 or 2 arabinosyl residues as sole carbon source. No

growth has been observed after 24 h of incubation with

these arabino-xylo-oligosaccharides (Supporting Informa-

tion Table S2). Thus the oligosaccharide transport in F5

seems specific to unbranched XOS in E. coli host.

The PUL encodes two transport systems, a MFS and

a SusC/D pair. MFS and SusC are known as membrane

proteins and SusD is a lipidated protein predicted to be

anchored to the outer membrane of Bacteroides species.

To assess the transport ability and specificity of the

potential transporters, new F5 variants were constructed

(Fig. 1B). The first variant, named F5min_SUS, harbours

the hydrolases and the arranged SusC/D homologs. The

second variant, named F5min_MFS, harbours the hydro-

lases and the MFS transporter. The growth on a mixture

of XOS (from DP 2 to DP 6) has been investigated.

F5min_SUS was able to grow on linear XOS mixture but

with a level of growth lower than F5 (Fig. 3). We thus

confirmed that the SusC/D system protein is an active

XOS transporter. However, this transporter was not suffi-

cient to completely restore the growth of the E. coli strain

harbouring the full F5 insert, suggesting the involvement

of another transporter. On the XOS mixture, F5min_MFS

showed a growth curve similar to F5 confirming the func-

tionality of the MFS transporter (Fig. 3). To grow on

XOS, F5 required at least one of its two functional trans-

porters, the SusC or the MFS transporter.

We investigated further the function of the SusD by

deleting the corresponding gene in the F5min_SUS

clone. This construct was named F5min_SUSDSusD

(Fig. 1B). This deletion completely abolished the ability

of E. coli to grow on XOS. As the susD gene being

located downstream of susC, any transcriptional effect

can be excluded, suggesting that the presence of SusD

was essential for the functionality of the SusC trans-

porter and for metabolism of XOS (Fig. 3). This result is

in agreement with previous studies demonstrating that,

in Bacteroides strains, DSusD mutants were unable to

grow on their targeted poly- and oligosaccharides (Koro-

patkin et al., 2008; Cameron et al., 2014; Tauzin et al.,

2016). The function of SusD is not restricted to its bind-

ing ability, its physical presence being essential for strain

growth on its targeted glycan. SusD physical presence

is sufficient, since supplementation of the DSusD strain

with the SusD* variant, which is a SusD mutant unable

to bind glycan, was enough to restore the growth of the

bacteria (Cameron et al., 2014; Tauzin et al., 2016).

To characterize more precisely the transport specific-

ity of each F5 transport system, namely SusC/D and

MFS, we monitored the variant growth on individual lin-

ear XOS of various lengths (Fig. 4C and E). While

F5min_SUS was able to grow on XOS up to DP 3,

F5min_MFS was able to grow on XOS up to DP 4. To

visualize the kinetic of internalization of XOS of different

DPs (from DP 2 to DP 3 for F5min_SUS and from DP 2

to 4 for F5 and F5min_MFS), we monitored their disap-

pearance from the culture supernatants using HPAEC-

PAD analysis. F5 and F5min_MFS, grown on XOS of

specified DP (2, 3 and 4), consumed each oligosaccha-

ride to completion in 24 h (Fig. 4B and F). The rate of

utilization was similar for xylobiose and xylotriose which

were both consumed faster than xylotetraose (Fig. 4B

and F). In contrast, F5min_SUS left residual xylobiose

and xylotriose in the culture supernatant after 24 h,

while the culture had already reached the stationary

phase which remained at a final OD600nm lower than

with F5 and F5min_MFS (Fig. 4C and D).

To date, two other XOS MFS transporters, specific to

linear XOS of either DP 3 or 6, have been reported in

Klebsiella spp. (Qian et al., 2003; Shin et al., 2010). In

addition, a TonB-dependent (SusC-like) transporter

essential for growth on XOS has been identified in Xan-

thomonas campestris but its specificity has still not been

investigated (D�ejean et al., 2013). In contrast, no func-

tional characterization of SusC and MFS transporters

specific to XOS has been reported in Bacteroides so

far, even if several xylan PULs have been described in

gut bacteria. SusC (BACOVA_4393) and MFS

(BACOVA_4388) contained in the XylS from B. ovatus

shared only 37% and 30% identity with the SusC and

the MFS, respectively, (Rogowski et al., 2015). These

transporters are to date the best homologs to the F5

transporters in validated PULs targeting XOS, but again,

their function has not been experimentally validated, nei-

ther was their transport specificity investigated.

Intriguingly, while no oligosaccharide was released

throughout the XOS uptake by F5min_SUS, residual xylo-

biose and xylose were observed throughout the growth of

F5 and F5min_MFS on xylobiose and xylotriose (Support-

ing Information Figs S3–S5). The released xylobiose and

xylose were consumed thereafter as well as the XOS ini-

tially present in the culture supernatant. The observed

exchanges of carbohydrates between the intra- and the

extra-cellular compartments are in line with the character-

istics of the transporters. The members of the MFS trans-

porter family indeed transport the compounds depending

on their concentration gradient while the TonB-dependent

transporters depend on energy coupling. We hypothe-

sized that the increase of xylose and xylobiose into the

cells changed the gradient concentration leading to

the release of these compounds outside the cells by the

MFS transporter.

It is clear from the results presented on Figs 3 and 4

that in E. coli, the MFS transporter allows a higher growth

rate than the SusC transporter. This is not due to the tran-

scription levels because in E. coli grown on LB the MFS

transporter is seven times lower expressed than SusC

(Fig. 2). Three hypotheses may explain this phenomenon.

8 A. S. Tauzin et al. j

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 19: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

First, it may be possible that the affinity for XOS is higher

for the MFS transporter than for the SusC/D transporter.

Another possibility would be that the SusC/D transport sys-

tem would not be perfectly functional when expressed in E.

coli compared to the native organism. The Ser residue at

the N-terminal second position of SusD suggests a localiza-

tion at the E. coli outer membrane (Yamaguchi et al., 1988;

Okuda and Tokuda, 2011). This is in accordance with our

results showing that SusD is necessary for SusC transport

function. However, its orientation towards the outer mem-

brane could not be confirmed. The last explanation could

be related to the different characteristics of both transport-

ers. The TBDT transporters indeed require energy derived

from the proton motive force thanks to the interaction with

the TonB-ExbB-ExbD complex (Noinaj et al., 2010) which

is provided by E. coli host as described by Phansopa et al.

(2014). Transport through SusC/D might thus be energy-

consuming for the E. coli host. In contrast, transport

through MFS is passive and just driven by gradient concen-

tration (Yan, 2015). Whatever is the mechanism of XOS

transport in the heterologous E. coli host, it cannot be

extended to what happens in the native organism, as the

cellular localization of the transporters could be different.

As explained above, in Bacteroides, it is probable that the

SusC/D transporter sits in the outer membrane, working in

coordination with the MFS transporter located in the inner

membrane. Based on the growth ability of the F5 clone

and all of its variants, we assume that in E. coli the MFS

protein would be located in the outer membrane, allowing

the XOS internalization within the periplasmic space, where

they would be hydrolyzed by the GHs bounds to the inner

and/or outer membranes and oriented towards the periplas-

mic space. Nevertheless, further experiments will be neces-

sary to demonstrate the exact localization of these

transporters in the native strains and when produced in

E. coli.

The PUL in B. vulgatus is involved in XOS utilization

We studied the biological function of the PUL in B. vulga-

tus as the metagenomic clone has high homology to

genes in B. vulgatus ATCC 8482. Growth dynamics of B.

vulgatus on XOS and a variety of xylans revealed that B.

vulgatus was able to grow on the XOS and the two enzy-

matically digested products of wheat arabinoxylan, AX2

and A2X4. B. vulgatus growth on XOS is very robust,

while on the branched oligosaccharides the growth is

delayed and at a lower rate. In contrast to the growth

dynamics on xylo- and arabino-xylo-oligosaccharides, B.

vulgatus was unable to utilize complex heavily decorated

xylans including RAX, SAX and CAX or the arabinoxylan

WAX (Fig. 5A). Although the intrinsic turbidity of

A

D

B C

Fig. 5. Growth dynamics of B. vulgatus ATCC 8482 on xylo-oligosaccharides and various xylans; and transcriptional responses of B. vulgatus

to xylo-oligosaccharides.Growth curves of B. vulgatus (n512) on xylo-oligosaccharides (A), two wood xylans (B), arabinoxylans from various sources (C). XOS, xylo-

oligosaccharides; A2X4, 23,33-di-a-L-arabinofuranosyl-xylotetraose; AX2, 32-a-L-arabinofuranosyl-xylobiose; RAX, rice arabinoxylan; CAX,

corn arabinoxylan; SAX, sorghum arabinoxylan; WAX, wheat arabinoxylan.

Transcription of B. vulgatus PUL genes in response to xylo-oligosaccharides (n53) (D).

Carbohydrate transporters of gut bacteria 9

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 20: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Birchwood and Beechwood xylans resulted in elevated

initial absorbance at 600 nm as compared to other

xylans, the marginal growth rate suggested a poor and

delayed utilization of these simple wood glucuronoxylans

by B. vulgatus. Thus we hypothesize that gene clusters

in B. vulgatus have evolved to target shorter xylose poly-

mers, i.e. xylo-oligosaccharides, rather than xylans.

To investigate whether the utilization of XOS is con-

tributed by the function of this PUL, we did transcrip-

tional analyses of the PUL in response to XOS

induction. The SusC,D-genes, two gh43, gh10 and the

mfs gene expressions are highly induced by XOS, sug-

gesting that these genes of this PUL are responsive for

XOS utilization. SusC- and SusD- like genes and mfs

gene are most highly induced (more than 1000-fold) as

compare to genes encoding the glycoside hydrolases,

which are also induced at least 40-fold relative to glu-

cose or xylose growing condition (Fig. 5B). Interestingly,

no induction was observed with gh16 gene, indicating a

break in the gene cluster between MFS and GH16. Our

results suggested that this PUL in B. vulgatus indeed

involves XOS utilization. This finding further corrobo-

rated our functional analysis of the metagenomic clone

in an E. coli recombinant host.

Conclusions

Metagenomic is a powerful tool to explore the gut bacte-

ria diversity and specificity, as well as an extensive

genetic source for discovering new functions. However,

the rapid production of metagenomic data is vastly out-

pacing functional studies, which underscores the critical

need for protein biochemical characterization and struc-

tural enzymology to inform bioinformatics and systems

biology (Andr�e et al., 2014). As showed here, the direct

study of metagenomic clones for characterization of new

functions and/or new protein families could be a good

strategy not only to save time, but also to study some

complex mechanisms that require the synergistic action

of different proteins. Previously, some multigenic sys-

tems issued of metagenomic libraries have been used

to optimize E. coli abilities to produce ethanol or antifun-

gal activity (Chung et al., 2008; Loaces et al., 2015), but

no biochemical characterization of PULs heterologously

expressed in E. coli has been published so far. Here,

we demonstrated that E. coli is an interesting recombi-

nant host for characterizing the individual components

of PUL systems from Bacteroides strains. The present

work constitutes the first experimental study of expres-

sion in E. coli of metagenomic multigenic cluster cloned

in fosmids. Our results suggested that E. coli may be

able to recognize its own promoter sequences within the

metagenomic inserts, in particular DNA issued from

Bacteroidetes (Lam and Charles, 2015). This spurious

transcription can be highly advantageous to study the

synergistic action of proteins encoded on a same meta-

genomic locus.

In the present work, we characterized a PUL-like sys-

tem that confers E. coli the ability to metabolize XOS.

Taking into account the possible inability of E. coli to

produce extracellular or cell surface attached proteins,

this new functionality requires the coordinated action of

at least 2 activities: (i) functional transport to internalize

oligosaccharides and (ii) oligosaccharide hydrolysis to

release monomers that will be used for E. coli growth.

The study of the transport system was based on growth

screening and required the presence of both transport

and hydrolytic functions.

To conclude, the present results pave the way for

boosting the functional characterization of individual

components of PULs, especially transporters issued

from cultured and uncultured Bacteroidetes. The generic

approach we developed could be extended to study

other catabolic pathways that are crucial for host and

dietary glycan harvesting by prominent gut bacteria, and

even for metabolism of other bioactive compounds.

Moreover, the construction and characterization of

recombinant E. coli strains that are able to metabolize

plant cell wall components opens the way to further

metabolic engineering works to develop microbial cell

factories dedicated to bio-sourced product synthesis.

Experimental procedures

Cloning

The metagenomic clones F5 (Genbank accession number

HE717017) and F4 (control, HE717016) were obtained from

metagenomic library issued of human fecal sample as previ-

ously described (Cecchini et al., 2013). Both are metagenomic

fragments cloned into pCC1fos fosmid and transformed in

EPI100 E. coli cells (Epicentre Technologies). The minimal

variants of the clone F5 were constructed by using the In-

Fusion HD Plus cloning kit (Clontech) following the manufac-

turer’s instructions. The primers used in this study are listed in

Supporting Information Table S1. Considering that the expres-

sion of the genes might be driven by promoter sequence

potentially located in the sequence of the upstream gene,

each variant was cloned by amplifying the CDS and the cog-

nate �600 bp to 1 kb upstream sequence.

Growth study

Media. All E. coli cells were grown on minimal synthetic

media (Na2HPO4�12H2O 17.4 g l21, KH2PO4 3.03 g l21,

NH4Cl 2.04 g l21, NaCl 0.51 g l21, MgSO4 0.49 g l21, CaCl24.38 mg l21, Na2EDTA�2H2O 15 mg l21, ZnSO4�7H2O

4.5 mg l21, CoCl2�6H2O 0.3 mg l21, MnCl2�4H2O 1 mg l21,

H3BO3 1 mg l21, Na2MoO4�2H2O 0.4 mg l21, FeSO4�7H2O

10 A. S. Tauzin et al. j

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 21: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

3 mg l21, CuSO4�5H2O 0.3 mg l21, thiamine 0.1 g l21 and

leucine 0.02 g l21) containing an appropriate carbon source

and supplemented with 12.5 mg l21 chloramphenicol. After a

first growing step in LB medium supplemented with

12.5 mg l21 chloramphenicol, an overnight culture in mini-

mum synthetic media containing xylose was realized to inoc-

ulate 0.5 ml of minimum synthetic media containing xylose

at an optical density (OD) at 600 nm of 0.05 into 48-well

microplate. The growth was followed by measuring the

OD600 over 24 h at 378C using the FLUOStar Optima (BMG

Labtech).

Bacteroide vulgatus ATCC 8482 strain was routinely

grown in tryptone-yeast extract-glucose (TYG) medium

(Holdeman et al., 1977), type-1 minimal medium (Urs,

Pudlo and Martens, unpublished data). Carbon sources

were added to a final concentration of 5 mg ml21 unless

otherwise stated. Cultures were grown at 378C in an anaer-

obic chamber (10% H2, 5% CO2, and 85% N2; Coy Manu-

facturing, Grass Lake, MI).

To quantify growth dynamics of B. vulgatus on various

carbon sources, the increase in culture absorbance

(600 nm) in 200 ml cultures was measured every 10 min on

an automated plate reader (Martens et al., 2011). Growth

dynamics showed the average of 12 replicates for each car-

bon source.

Growth substrates. Growth dynamics of E. coli and B. vul-

gatus were performed on minimal media supplemented with

a variety of oligosaccharides and polysaccharides carbon

sources. We used a mixture of XOS (WAKO and IOR-

TAIHE, from DP 2 to DP 7). Individual XOS from DP 2 to 5

and arabino-xylo-oligosaccharides (32-a-L-arabinofuranosyl-

xylobiose, AX2; 23-a-L-arabinofuranosyl-xylotriose, AX3;

23,33-di-a-L-arabinofuranosyl-xylotriose, A2X3; 23,33-a-L-

arabinofuranosyl-xylotetraose, AX4; and 23,33-di-a-L-arabi-

nofuranosyl-xylotetraose, A2X4) were purchased from

Megazyme. Simple xylans, with sparsely decorated struc-

tures, were purchased from Sigma for beechwood xylan,

from Sigma for birchwood glucuronoxylan and from Mega-

zyme for wheat arabinoxylan (WAX). More complex heavily

decorated glucuronoarabinoxylans (rice, RAX; sorghum,

SAX, and corn, CAX) were kind gifts of Dr. Bruce Hamake

(Purdue University).

Gene expression analyses

RNA extraction and retrotranscription in cDNA. Fromthree independent cultures of F5 clone, total RNAs were

extracted as previously described (Nouaille et al., 2009).

Briefly, 10 ml of exponentially growing cells (OD6005 1) in

LB medium were collected, centrifuged and the pellets

were immediately frozen in liquid nitrogen. Cells were dis-

rupted through high-speed shaking with stainless steel

beads. Total RNAs were extracted using an RNeasy mini

kit (Qiagen) following the manufacturer’s instructions.

RNAs were quantified using a NanoDropTM and their qual-

ity was controlled using a Bioanalyzer RNA kit (Agilent

Technologies).

The equivalent of 50 mg of RNA was subjected to DNAse

treatment and purified with RNeasy Mini spin column

(Qiagen). Then 5 mg of RNA were retrotranscribed using the

SuperScriptVR II RT (ThermoFisher Scientific) according to

the manufacturer’s protocol and cDNA were purified using

illustraTM MicroSpinTM G-25 columns (GE Healthcare).

For the transcriptional analysis in B. vulgatus, total RNA

was extracted using RNeasy mini kit (Qiagen) from 5 ml of

exponentially growing B. vulgatus culture in minimal

medium containing 5 mg ml21 of XOS. Contaminating DNA

was removed with TURBO DNA-freeTM Kit (Ambion).

Reverse transcription was performed with 1 lg of RNA

using Super ScriptVR III Reverse Transcriptase (Thermo

Fisher Scientific) using random primers (Invitrogen) accord-

ing to manufacturer’s instructions. cDNA quantification was

performed with a MastercyclerVR ep realplex (Eppendorf),

using homemade SYBRVR qPCR mix containing Hot-start

Taq Polymerase (NEB) and 400 nM primers, except

62.5 nM primers for 16S rRNA, for 40 cycles of 958C for

3 s, 528C for 20 s, 688C for 20 s, followed by a melting step

to determine amplicon purity. All transcript levels were nor-

malized based on 16S rRNA abundance. The expression of

each gene in the PUL was expressed relative to the tran-

script level of glucose or xylose growing condition.

Primer design. The primers used for real-time quantitative

PCR of each gene on the F5 metagenonic clone insert

were designed with Bio-Rad Beacon Designer software to

have lengths from 18 to 22 bases, GC contents of more

than 50%, melting temperatures of about 608C and to

amplify PCR products between 83 and 148 bases long

(Supporting Information Table S1).

The primers used for real-time quantitative PCR for each

gene in B. vulgatus PUL were designed using Primer 3.

These primers range from 18 to 24 bases, with GC con-

tents between 40% and 60%. The melting temperatures

lies around 608C and the amplicon size range between 80

and 150 bases (Supporting Information Table S1).

High throughput real-time quantitative PCR. Highthroughput real-time quantitative PCR was carried out using

the 48.48 dynamic arrayTM IFCs and the BioMarkTM HD

System (Fluidigm Corporation, CA, USA) following the man-

ufacturer’s protocol (Spurgeon et al., 2008) and performed

at GeT-PlaGe facilities (Castanet, France). Prior to RNA

expression analysis, primer specificity and the absence of

genomic DNA contamination in extracted total RNAs were

checked.

In total 1044 data were collected from qPCR analyses

combining 4 technical replicates (used at 3 different dilu-

tions) issued from 3 biological samples and the 29 primer

couples corresponding to the 27 genes of the metagenomic

insert and the 2 additional control genes (ihfB and cam).

Data analysis. Relative mRNA expression means were cal-

culated from the biological triplicates after initial raw data

analysis accomplished with the Fluidigm real-time PCR

analysis software v.4.1.2. The PCR efficiency was checked

for each primer couple and was close to 100%. The com-

parative DDCt method was used to calculate the change in

transcripts levels with correction (Livak and Schmittgen,

2001). As the best alternative, the mean expression of 5

less expressed genes was used to determine which gene is

Carbohydrate transporters of gut bacteria 11

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 22: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

significantly expressed compared to the threshold we fixed

at fivefold the value of the less expressed genes.

Two reference genes were used for data normalization

between samples: the integration host factor b-subunit

(ihfB) which is one of the commonly used reference gene in

E. coli (Weglenska et al., 1996) as its expression remained

constant throughout growth, and the chloramphenicol resist-

ance gene (cam) encoding for chloramphenicol acetyltrans-

ferase present on the recombinant vector and essential for

antibiotic resistance.

Enzymatic assays

The cells were grown on LB supplemented with

12.5 mg l21 chloramphenicol and inoculated with over-

night culture at 0.05 of OD600. When the OD600 reaches

1, the cells were harvested and the pellet was suspended

in 50 mM potassium phosphate buffer pH 7.0 containing

lysozyme (0.5 mg ml21 final concentration) to reach an

OD600 of 80. After incubation at 378C for one hour, the

suspension was frozen 15 min at 2808C and then

defrosted. Then samples were centrifuged and the super-

natant (cell extracts) was used to performed activity test.

All reactions were carried out at 378C in 50 mM potassium

phosphate buffer pH 7.0.

The activity tests against complex polysaccharides

(xylan, arabinoxylan, arabinan, arabinogalactan, b-glucan,

xyloglucan) were measured using the 3,5-dinitrosalicylic

acid reducing-sugar (DNS) assay. Reaction samples (250 ml

of cell extract incubated with 5 mg ml21 of specified sub-

strate) were added to an equal volume of DNS reagent to

terminate the reaction, and the colour was developed by

boiling for 5 min. Enzymatic activities with various para-

NitroPhenol (pNP) sugar derivatives were also realized.

After incubation of 150 ml of cell extract with 1 mM pNP-

glycosides (pNP-a-D-xylopyranose, pNP-b-D-xylopyranose,

pNP-a-L-arabinofuranose, pNP-a-L-arabinopyranose and

pNP-b-L-arabinopyranose), the reaction was stopped by

raising the pH to 11.0 through the addition of an equal vol-

ume of 0.2M Na2CO3. The released of reducing-sugar

(DNS assay) and pNP were measured in an Optima

(TECAN) at A540nm and A405nm respectively. A standard

curve was used to calculate product concentration.

Cellular localization of proteins

The control (pCC1fos empty) and F5min_GH clones were

grown in 250 ml of LB at 378C until OD600 reaches 0.9. The

cells were collected by centrifugation at 4400 g for 10 min

at 48C. The supernatant was filtered (0.22 mm) and tested

for secreted activity. The other protein fractions were

obtained from the different treatments of the pellet as

described by Larsbrink et al. (2011).

Briefly, the periplasmic proteins were collected using an

osmotic shock. The cells were washed with 10 ml of

50 mM Tris-HCl (pH 7.7) and collected by centrifugation at

4400 g for 10 min at 48C. The pellet was resuspended in

50 ml of 30 mM Tris-HCl, 20% (w/v) sucrose and 1 mM

EDTA (pH 8.0), and the cells were incubated at room tem-

perature for 10 min. The cells were then collected by

centrifugation at 4400 g for 15 min at 48C. The pellet was

resuspended in ice-cold 5 mM MgSO4, and the cells were

incubated on ice for 10 min. The cells were collected by

centrifugation at 14 000 g for 10 min at 48C. The superna-

tant was retained and contained the periplasmic proteins.

The pellet was resuspended in 50 mM sodium phosphate

buffer (pH 7.4) and sonicated to lyse cells. The lysate was

centrifuged at 5000 g for 10 min at 48C. Using an ultracen-

trifuge, the supernatant was centrifuged at 100 000 g for

1 h at 48C to recover the cytoplasmic proteins. The pellet of

the lysate was resuspended in 100 mM sodium carbonate

buffer (pH 9.0) and centrifuged at 100 000 g for 1 h at 48C.

The supernatant from this step contained the potential

trapped soluble proteins and/or weakly membrane-associated

proteins. The pellet, containing the membrane proteins, was

resuspended in 50 mM sodium phosphate buffer (pH 7.4).

XOS uptake

To assay the XOS uptake, cells were grown in M9 medium

supplemented with XOS of specified chain length or mix-

ture of XOS at 378C. Growth was monitored by measuring

the A600nm. During the growth, samples were collected at

regular time point and centrifuged. The supernatants were

filtered and conserved at 2208C. The amount of XOS pres-

ent in the culture supernatants were analyzed by HPAEC-

PAD on a Dionex ICS-3000 system (Dionex) equipped with

a CarboPac PA100 column. The analyses were carried out

at 308C with a flow rate of 0.5 ml min21 with the following

multistep gradient: 0–30 min (0–60% B), 30–31 min (60–

0% B) and 31–36 min (0% B). Solvents were 150 mM

NaOH (eluent A) and 150 mM NaOH, 500 mM CH3COONa

(eluent B). To quantify the remaining concentration in the

culture supernatant of XOS, the respective commercial oli-

gosaccharides (Megazymes) were used as standards.

Bioinformatic analyses

Promoter consensus sequences used to identify promoters

from E. coli (rpoD/r70) and Bacteroides (rABfr) were

(TTGACA15-19TATAAT) and (TTTG19-21TA2TTTG), respec-

tively, (Mastropaolo et al., 2009). The BPROM program

was used to identify the putative promoters in E. coli (Solo-

vyev and Salamov, 2011).

LipoP and SignalP servers were used to determine the

presence and location of lipoprotein and other protein sig-

nal peptide cleavage sites, respectively, (Juncker et al.,

2003; Petersen et al., 2011).

Acknowledgements

This research was funded by the French National Center of

Excellence Toulouse White Biotechnology. We cordially thank

Amandine Deroite, Nathan Davidenko, Adrien Guibert, Clar-

isse Lozano and Nelly Monties for their technical assistance.

The analytic work was carried out at the Laboratory for Bio-

Systems & Process Engineering (Toulouse, France) with the

equipment of the ICEO facility. MetaToul (Metabolomics &

Fluxomics Facitilies, Toulouse, France, www.metatoul.fr) and

12 A. S. Tauzin et al. j

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 23: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

its staff members are gratefully acknowledged for technical

support and access to microplate reader. MetaToul is part of

the national infrastructure MetaboHUB-ANR-11-INBS-0010

(The French National infrastructure for metabolomics and

fluxomics, www.metahub.fr). MetaToul is supported by grants

from the R�egion Midi-Pyr�en�ees, the European Regional

Development Fund, SICOVAL, the Infrastructures en Biologie

Sante et Agronomie (IBiSa, France), the Centre National de la

Recherche Scientifique (CNRS) and the Institut National de la

Recherche Agronomique (INRA). The work on Bacteroides

was supported by the grant GM090080 fromNIH.

Conflict of interest

Authors have no conflict of interest to declare.

References

Andr�e, I., Potocki-V�eronese, G., Barbe, S., Moulis, C., and

Remaud-Sim�eon, M. (2014) CAZyme discovery and

design for sweet dreams. Curr Opin Chem Biol 19: 17–24.

Bolam, D.N., and Koropatkin, N.M. (2012) Glycan recogni-

tion by the Bacteroidetes Sus-like systems. Curr Opin

Struct Biol 22: 563–569.

Cameron, E.A., Maynard, M.A., Smith, C.J., Smith, T.J.,

Koropatkin, N.M., and Martens, E.C. (2012) Multidomain

carbohydrate-binding proteins involved in Bacteroides

thetaiotaomicron starch metabolism. J Biol Chem 287:

34614–34625.

Cameron, E.A., Kwiatkowski, K.J., Lee, B.H., Hamaker,

B.R., Koropatkin, N.M., and Martens, E.C. (2014) Multi-

functional nutrient-binding proteins adapt human symbi-

otic bacteria for glycan competition in the gut by

separately promoting enhanced sensing and catalysis.

MBio 5: e01441–e01414.

Cecchini, D.A., Laville, E., Laguerre, S., Robe, P., Leclerc,

M., Dor�e, J., et al. (2013) Functional metagenomics

reveals novel pathways of prebiotic breakdown by human

gut bacteria. PLoS One 8: e72766.

Chung, E.J., Lim, H.K., Kim, J., Choi, G.J., Park, E.J., Lee,

M.H, et al. (2008) Forest soil metagenome gene cluster

involved in antifungal activity expression in Escherichia

coli. Appl Environ Microbiol 74: 723–730.

Cuskin, F., Lowe, E.C., Temple, M.J., Zhu, Y., Cameron,

E.A., Pudlo, N.A, et al. (2015) Human gut Bacteroidetes

can utilize yeast mannan through a selfish mechanism.

Nature 517: 165–169.

D�ejean, G., Blanvillain-Baufum�e, S., Boulanger, A., Darrasse,

A., Bernonville, T.D.D., Girard, A.L, et al. (2013) The xylan

utilization system of the plant pathogen Xanthomonas

campestris pv campestris controls epiphytic life and

reveals common features with oligotrophic bacteria and

animal gut symbionts. New Phytol 198: 899–915.

Dodd, D., Mackie, R.I., and Cann, I.K.O. (2011) Xylan deg-

radation, a metabolic property shared by rumen and

human colonic Bacteroidetes. Mol Microbiol 79: 292–304.

Ferguson, A.D., and Deisenhofer, J. (2002) TonB-depend-

ent receptors-structural perspectives. Biochim Biophys

Acta 1565: 318–332.

Ferrer, M., Golyshina, O.V., Chernikova, T.N., Khachane,

A.N., Reyes-Duarte, D., Santos, V.A., et al. (2005) Novel

hydrolase diversity retrieved from a metagenome library of

bovine rumen microflora. Environ Microbiol 7: 1996–2010.

Hehemann, J.H., Correc, G., Barbeyron, T., Helbert, W.,

Czjzek, M., and Michel, G. (2010) Transfer of

carbohydrate-active enzymes from marine bacteria to

Japanese gut microbiota. Nature 464: 908–912.

Hess, M., Sczyrba, A., Egan, R., Kim, T.W., Chokhawala,

H., Schroth, G., et al. (2011) Metagenomic discovery of

biomass-degrading genes and genomes from cow rumen.

Science 331: 463–467.

Holdeman, L.V., Cato, E.D., and Moore, W.E.C. (1977)

Anaerobe Laboratory Manual, 4th ed. Blacksburg, VA:

Virginia Polytechnic Institute and State University.

Juncker, A.S., Willenbrock, H., Heijne, G.V., Brunak, S.,

Nielsen, H., and Krogh, A. (2003) Prediction of lipoprotein

signal peptides in Gram-negative bacteria. Protein Sci

12: 1652–1662.

Koropatkin, N.M., Martens, E.C., Gordon, J.I., and Smith,

T.J. (2008) Starch catabolism by a prominent human gut

symbiont is directed by the recognition of amylose heli-

ces. Structure 16: 1105–1115.

Lam, K.N., and Charles, T.C. (2015) Strong spurious tran-

scription likely contributes to DNA insert bias in typical

metagenomic clone libraries. Microbiome 3: 22.

Lam, K.N., Cheng, J., Engel, K., Neufeld, J.D., and

Charles, T.C. (2015) Current and future resources for

functional metagenomics. Front Microbiol 6: 1196.

Larsbrink, J., Izumi, A., Ibatullin, F.M., Nakhai, A., Gilbert,

H.J., Davies, G.J., and Brumer, H. (2011) Structural and

enzymatic characterization of a glycoside hydrolase fam-

ily 31 a -xylosidase from Cellvibrio japonicus involved in

xyloglucan saccharification. Biochem J 567–580.

Larsbrink, J., Rogers, T.E., Hemsworth, G.R., McKee, L.S.,

Tauzin, A.S., Spadiut, O., et al. (2014) A discrete genetic

locus confers xyloglucan metabolism in select human gut

Bacteroidetes. Nature 506: 498–502.

Livak, K.J., and Schmittgen, T.D. (2001) Analysis of relative

gene expression data using real-time quantitative PCR and

the 2(-delta delta C(T)) method. Methods 25: 402–408.

Loaces, I., Amarelle, V., and Mu~noz-Gutierrez, I. (2015)

Improved ethanol production from biomass by a rumen meta-

genomic DNA fragment expressed in Escherichia coli MS04

during fermentation. Appl Environ Microbiol 99: 9049–9060.

Markowitz, V.M., Chen, I.M.A., Palaniappan, K., Chu, K.,

Szeto, E., Grechkin, Y., et al. (2012) IMG: the integrated

microbial genomes database and comparative analysis

system. Nucl Acids Res 40: D115–D122.

Martens, E.C., Chiang, H.C., and Gordon, J.I. (2008) Muco-

sal glycan foraging enhances fitness and transmission of

a saccharolytic human gut bacterial symbiont. Cell Host

Microbe 4: 447–457.

Martens, E.C., Lowe, E.C., Chiang, H., Pudlo, N. A., Wu,

M., McNulty, N.P., et al. (2011) Recognition and degrada-

tion of plant cell wall polysaccharides by two human gut

symbionts. PLoS Biol 9: e1001221.

Mastropaolo, M.D., Thorson, M.L., and Stevens, A.M.

(2009) Comparison of Bacteroides thetaiotaomicron and

Escherichia coli 16S rRNA gene expression signals.

Microbiology 155: 2683–2693.

Carbohydrate transporters of gut bacteria 13

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 24: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Nielsen, H.B., Almeida, M., Juncker, A.S., Rasmussen, S.,

Li, J., Sunagawa, S., et al. (2014) Identification and

assembly of genomes and genetic elements in complex

metagenomic samples without using reference genomes.

Nat Biotechnol 32: 822–828.

Noinaj, N., Guillier, M., Barnard, T.J., and Buchanan, S.K.

(2010) TonB-dependent transporters: regulation, struc-

ture, and function. Annu Rev Microbiol 64: 43–60.

Nouaille, S., Even, S., Charlier, C., Loir, Y.L., Cocaign-

Bousquet, M., and Loubiere, P. (2009) Transcriptomic

response of Lactococcus lactis in mixed culture with Staphy-

lococcus aureus. Appl Environ Microbiol 75: 4473–4482.

Okuda, S., and Tokuda, H. (2011) Lipoprotein sorting in

bacteria. Annu Rev Microbiol 65: 239–259.

Petersen, T.N., Brunak, S., Heijne, G.V., and Nielsen, H.

(2011) SignalP 4.0: discriminating signal peptides from

transmembrane regions. Nat Methods 8: 785–786.

Phansopa, C., Roy, S., Rafferty, J.B., Douglas, C.W.I.,

Pandhal, J., Wright, P.C., et al. (2014) Structural and func-

tional characterization of NanU, a novel high-affinity sialic

acid-inducible binding protein of oral and gut-dwelling Bac-

teroidetes species. Biochem J 458: 499–511.

Qian, Y., Yomano, L.P., Preston, J.F., Aldrich, H.C., and

Ingram, L.O. (2003) Cloning, characterization, and func-

tional expression of the Klebsiella oxytoca xylodextrin uti-

lization operon (xynTB) in Escherichia coli. Appl Environ

Microbiol 69: 5957–5967.

Rogowski, A., Briggs, J.A., Mortimer, J.C., Tryfona, T.,

Terrapon, N., Lowe, E.C., et al. (2015) Glycan complexity

dictates microbial resource allocation in the large intes-

tine. Nat Commun 6: 7481.

Roy, S., Douglas, C.W.I., and Stafford, G.P. (2010) A novel

sialic acid utilization and uptake system in the periodontal

pathogen Tannerella forsythia. J Bacteriol 192: 2285–2293.

Schauer, K., Rodionov, D.A., and Reuse, H. D. (2008) New

substrates for TonB-dependent transport: do we only see

the “tip of the iceberg?”. Trends Biochem Sci 33: 330–338.

Seydel, A., Gounon, P., and Pugsley, A.P. (1999) Testing

the ’1 2 rule’ for lipoprotein sorting in the Escherichia

coli cell envelope with a new genetic selection. Mol

Microbiol 34: 810–821.

Shin, H., Mcclendon, S., Vo, T., and Chen, R.R. (2010)

Escherichia coli binary culture engineered for direct fer-

mentation of hemicellulose to a biofuel. Appl Environ

Microbiol 76: 8150–8159.

Shipman, J.A., Cho, K.H., Siegel, H.A., and Salyers, A.A.

(1999) Physiological characterization of SusG, an outer

membrane protein essential for starch utilization by Bac-

teroides thetaiotaomicron. J Bacteriol 181: 7206–7211.

Shultzaberger, R.K., Chen, Z., Lewis, K.A., and Schneider,

T.D. (2007) Anatomy of Escherichia coli s 70 promoters.

Nucleic Acids Res 35: 771–788.

Singh, S.S., Typas, A., Hengge, R., and Grainger, D.C.

(2011) Escherichia coli p 70 senses sequence and con-

formation of the promoter spacer region. Nucleic Acids

Res 39: 5109–5118.

Solovyev, V., and Salamov, A. (2011) Automatic annotation

of microbial genomes and metagenomic sequences. In

Metagenomics and its Applications in Agriculture, Biome-

dicine and Environmental Studies. Li, R.W. (ed.). New

York: Nova Science Publishers.

Spurgeon, S.L., Jones, R.C., and Ramakrishnan, R. (2008)

High throughput gene expression measurement with real

time pcr in a microfluidic dynamic array. PLoS One 3:

e1662.

Stafford, G., Roy, S., Honma, K., and Sharma, A. (2012)

Sialic acid, periodontal pathogens and Tannerella for-

sythia: stick around and enjoy the feast!. Mol Oral Micro-

biol 27: 11–22.

Strachan, C.R., Singh, R., VanInsberghe, D.,

Ievdokymenko, K., Budwill, K., Mohn, W.W., et al. (2014)

Metagenomic scaffolds enable combinatorial lignin trans-

formation. Proc Natl Acad Sci USA 111: 10143–10148.

Tasse, L., Bercovici, J., Pizzut-Serin, S., Robe, P., Tap, J.,

Klopp, C., et al. (2010) Functional metagenomics to mine

the human gut microbiome for dietary fiber catabolic

enzymes. Genome Res 20: 1605–1612.

Tauzin, A.S., Kwiatkowski, K.J., Orlovsky, N.I., Smith, C.J.,

Creagh, A.L., Haynes, C.A., et al. (2016) Molecular dis-

section of xyloglucan recognition in a prominent human

gut symbiont. MBio 7: e02134–e02115.

Terrapon, N., Lombard, V., Gilbert, H.J., and Henrissat, B.

(2015) Automatic prediction of polysaccharide utilization

loci in Bacteroidetes species. Bioinformatics 31: 647–655.

Tokuda, H., and Matsuyama, S.I. (2004) Sorting of lipopro-

teins to the outer membrane in E. coli. Biochim Biophys

Acta 1693: 5–13.

Turnbaugh, P.J., Ley, R.E., Hamady, M., Fraser-Liggett,

C.M., Knight, R., and Gordon, J.I. (2007) The human

microbiome project. Nature 449: 804–810.

Vimrt, E.R., and Troy, F.A. (1985) Identification of an induci-

ble catabolic system for sialic acids (nan) in Escherichia

coli. J Bacteriol 164: 845–853.

Wang, Y., Chen, Y., Zhou, Q., Huang, S., Ning, K., Xu, J.,

et al. (2012) A culture-independent approach to unravel

uncultured bacteria and functional genes in a complex

microbial community. PLoS One 7: e47530.

Weglenska, A., Jacob, B., and Sirko, A. (1996) Trancrip-

tional pattern of Escherichia coli ihfB (himD) gene

expression. Gene 181: 85–88.

Yamaguchi, K., Yu, F., and Inouye, M. (1988) A single

amino acid determinant of the membrane localization of

lipoproteins in E. coli. Cell 53: 423–432.

Yan, N. (2015) Structural biology of the major facilitator

superfamily transporters. Annu Rev Biophys 44: 257–283.

Supporting information

Additional supporting information may be found in the

online version of this article at the publisher’s web-site.

14 A. S. Tauzin et al. j

VC 2016 The Authors. Molecular Microbiology Published by John Wiley & Sons Ltd., Molecular Microbiology, 00, 00–00

Page 25: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

electronic reprint

ISSN: 1399-0047

journals.iucr.org/d

Structural bases for N-glycan processing by mannosidephosphorylase

Simon Ladeveze, Gianluca Cioci, Pierre Roblin, Lionel Mourey, SamuelTranier and Gabrielle Potocki-Veronese

Acta Cryst. (2015). D71, 1335–1346

IUCr JournalsCRYSTALLOGRAPHY JOURNALS ONLINE

This open-access article is distributed under the terms of the Creative Commons Attribution Licencehttp://creativecommons.org/licenses/by/2.0/uk/legalcode, which permits unrestricted use, distribution, andreproduction in any medium, provided the original authors and source are cited.

Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. · N-Glycan processing by mannoside phosphorylase

Page 26: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

research papers

Acta Cryst. (2015). D71, 1335–1346 http://dx.doi.org/10.1107/S1399004715006604 1335

Received 23 January 2015

Accepted 1 April 2015

Edited by Z. S. Derewenda, University of

Virginia, USA

Keywords: GH130 enzymes; N­glycans;

glycoside phosphorylases; human gut

microbiota.

PDB references: Uhgb_MP, apo, 4udi;

complex with mannose, 4udj; complex with

N­acetylglucosamine, 4udg; complex with

mannose and N­acetylglucosamine, 4udk

Supporting information: this article has

supporting information at journals.iucr.org/d

Structural bases for N­glycan processing bymannoside phosphorylase

Simon Ladeveze,a,b,c Gianluca Cioci,a,b,c Pierre Roblin,d Lionel Mourey,e,f

Samuel Traniere,f* and Gabrielle Potocki­Veronesea,b,c*

aUniversite de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, 31077 Toulouse, France, bCNRS, UMR5504,

31400 Toulouse, France, cINRA, UMR792 Ingenierie des Systemes Biologiques et des Procedes, 31400 Toulouse, France,dSynchrotron SOLEIL, L’Orme des Merisiers, BP 48, Saint Aubin, 91192 Gif­sur­Yvette CEDEX, France, eInstitut de

Pharmacologie et de Biologie Structurale (IPBS), Centre National de la Recherche Scientifique (CNRS), 205 Route de

Narbonne, BP 64182, 31077 Toulouse, France, and fUniversite de Toulouse, Universite Paul Sabatier, IPBS,

31077 Toulouse, France. *Correspondence e­mail: [email protected], veronese@insa­toulouse.fr

The first crystal structure of Uhgb_MP, a �-1,4-mannopyranosyl-chitobiose

phosphorylase belonging to the GH130 family which is involved in N-glycan

degradation by human gut bacteria, was solved at 1.85 A resolution in the apo

form and in complex with mannose and N-acetylglucosamine. SAXS and crystal

structure analysis revealed a hexameric structure, a specific feature of GH130

enzymes among other glycoside phosphorylases. Mapping of the �1 and +1

subsites in the presence of phosphate confirmed the conserved Asp104 as the

general acid/base catalytic residue, which is in agreement with a single-step

reaction mechanism involving Man O3 assistance for proton transfer. Analysis of

this structure, the first to be solved for a member of the GH130_2 subfamily,

revealed Met67, Phe203 and the Gly121–Pro125 loop as the main determinants

of the specificity of Uhgb_MP and its homologues towards the N-glycan core

oligosaccharides and mannan, and the molecular bases of the key role played by

GH130 enzymes in the catabolism of dietary fibre and host glycans.

1. Introduction

N-linked glycans are present in many living organisms, notably

eukaryotes, and play a key role in major processes, including

cell signalling and recognition, protein stability and activity

tuning (Varki et al., 2009). These oligosaccharides, which are

covalently linked to the asparagine residues of glycoproteins,

display relatively restricted structural diversity (Lehle et al.,

2006; Larkin & Imperiali, 2011). In eukaryotes, N-glycans

share a common core structure composed of the �-d-Manp-

1,4-�-d-GlcpNAc-1,4-d-GlcpNAc (Man-GlcNAc2) trisac-

charide, carrying decorations on the nonreducing �-linked

mannosyl residue to form more complex structures (Aebi et

al., 2010; Nagae & Yamaguchi, 2012). Although the whole

pathways of human N-glycan synthesis and maturation have

been well described, little is known about their degradation,

especially by bacteria or fungi, despite the fact that degrada-

tion is a key factor in microbe–host interactions (Suzuki &

Harada, 2014). Until 2013, only glycoside hydrolases (GHs)

had been shown to be implicated in N-glycan breakdown in

the CAZy database (http://www.cazy.org/; Lombard et al.,

2014). Huge efforts have been made in recent years to

understand exactly how this hydrolytic process takes place,

particularly among gut inhabitants, as the alteration of host

glycans by microbes is thought to be related to intestinal

disorders, including Crohn’s disease and other inflammatory

ISSN 1399­0047

electronic reprint

Page 27: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

bowel diseases (IBDs; Martens et al., 2008; Sheng et al., 2012).

The aim of these studies was to identify a broad consortium of

enzymes acting on different parts of the N-glycan structure,

such as exo-mannosidases and endo-mannosidases or endo-N-

acetyl-�-d-glucosaminidases produced by gut commensals and

pathogens (Roberts et al., 2000; Burnaugh et al., 2008; Renzi et

al., 2011), particularly by Bacteroides species (Martens et al.,

2009; Zhu et al., 2010).

In 2013, the first evidence for N-glycan breakdown by

phosphorolysis was published, which involved gut bacterial

mannoside phosphorylases belonging to glycoside hydrolase

family 130 (GH130; Nihira et al., 2013; Ladeveze et al., 2013).

Only two enzymes, namely the mannoside phosphorylase (EC

2.4.1.–) Uhgb_MP, an enzyme produced by an uncultivated

Bacteroides bacterium, and Bt1033, produced by B. thetaiota­

omicron VPI-5482, are known to catalyze the conversion of

�-d-Manp-1,4-�-d-GlcpNAc (Man-GlcNAc) or �-d-Manp-

1,4-�-d-GlcpNAc-1,4-d-GlcpNAc (Man-GlcNAc2) and inor-

ganic phosphate into �-d-mannopyranose-1-phosphate and

GlcNAc or GlcNAc2, respectively (Nihira et al., 2013; Lade-

veze et al., 2013). CAZy subfamilies are subgroups found

within a family that share a more recent ancestor and that are

usually more uniform in molecular function, reflecting a high

degree of conservation in their active site (Aspeborg et al.,

2012). In the GH130 family, at least two enzyme subfamilies

have been identified (Ladeveze et al., 2013). Subfamily

GH130_1 gathers enzymes that are highly specific for �-d-

Manp-1,4-d-Glc, while GH130_2 contains enzymes that are

much more flexible towards mannosides. Uhgb_MP and

Bt1033 are classified in the GH130_2 subfamily, together with

40 other GH130 sequences, among which 15 originate from gut

bacterial genomes. Integration of metagenomic and genomic

data on the scale of the entire human gut microbiota revealed

that GH130_2 enzymes, especially Uhgb_MP and Bt1033,

probably play a critical role in alteration of the intestinal

barrier, as their encoding genes are particularly prevalent in

the human gut microbiome of patients suffering from IBDs

(Ladeveze et al., 2013). Based on genomic context analysis and

on the in silico detection of signal peptides, the physiological

role of Uhgb_MP and Bt1033 would be the intracellular

phosphorolyis of �-d-Manp-1,4-d-GlcNAc, which can be

internalized in the cell after extracellular hydrolysis of

N-glycans by glycoside hydrolases belonging to the GH18,

GH92 and possibly also the GH97 families (Ladeveze et al.,

2013). In addition to Uhgb_MP and Bt1033, subfamily

GH130_2 contains only one other biochemically characterized

enzyme, the RaMP2 enzyme from the ruminal bacterium

Ruminococcus albus 7. It has been suggested that this enzyme

is involved in mannan catabolism in the bovine rumen, as it

catalyzes the phosphorolysis of �-1,4-manno-oligosaccharides

(Kawahara et al., 2012). In vitro, these three enzymes present a

relaxed substrate specificity compared with all other known

mannoside phosphorylases. This property makes them extre-

mely interesting biocatalytic tools for the synthesis of diverse

manno-oligosaccharides by reverse phosphorolysis. In parti-

cular, Uhgb_MP is extremely efficient at producing N-glycan

core oligosaccharides such as �-d-Manp-1,4-d-GlcNAc and

�-d-Manp-1,4-�-d-GlcpNAc-1,4-d-GlcpNAc, the current

commercial price of which exceeds $10 000 per milligram

(Ladeveze et al., 2014). Moreover, it is the only known phos-

phorylase to act on mannans and long manno-oligosacchar-

ides. Uhgb_MP-based �-mannoside synthesis processes are

highly attractive, thanks to its flexible specificity and because it

is the only known enzyme able to produce such high added

value compounds from a hemicellulose constituent as a

substrate. Indeed, a one-pot reaction would allow Uhgb_MP

to produce �-d-Manp-1,4-�-d-GlcpNAc directly fromN-acetyl-

glucosamine and mannan following two reaction steps: a first

step of mannan phosphorolysis releasing �-d-Man-1-phos-

phate, and a second step of reverse phosphorolysis converting

�-d-Man-1-phosphate and N-acetylglucosamine into �-d-

Manp-1,4-�-d-GlcpNAc.

Currently, six GH130 enzyme structures are available in the

RCSB Protein Data Bank, sharing a common five-bladed

�-propeller fold. The crystal structure of BfMGP, a B. fragilis

NCTC 9343 enzyme classified with 78 other sequences into the

GH130_1 subfamily, was recently solved in complex with the

genuine substrates 4-O-�-d-mannosyl-d-glucose and phos-

phate and the product �-d-mannose-1-phosphate (Nakae et

al., 2013; PDB entries 3wat, 3was, 3wau and 4kmi). This

enzyme, which is involved in the final steps of mannan cata-

bolism in the human gut, exhibits a very narrow specificity

towards �-d-Manp-1,4-d-Glc (Senoura et al., 2011), like the

other characterized members of the GH130_1 subfamily

(RaMP1 from the ruminal bacterium R. albus 7 and the

RmMGP protein from the marine bacterium Rhodothermus

marinus DSM4252; Jaito et al., 2014). This pioneering study on

BfMGP highlighted a probably unique reaction mechanism

among known disaccharide phosphorylases, as the invariant

residue Asp131, which is assumed to be the general acid/base,

was not found close to the glycosidic O atom, which should be

protonated in the catalytic reaction.

The five other three-dimensional structures of GH130

enzymes available to date are the apo forms of (i) four

proteins belonging to the GH130_NC cluster, which gathers

enzymes that are not classified into the GH130_1 and

GH130_2 subfamilies, namely BACOVA_03624 and

BACOVA_02161 from B. ovatus ATCC 8483 (PDB entries

3qc2 and 4onz; Joint Center for Structural Genomics,

unpublished work), BDI_3141 from Parabacteroides distasonis

ATCC 8503 (PDB entry 3taw; Joint Center for Structural

Genomics, unpublished work) and BT_4094 from B. thetaiota­

omicron VPI-5482 (PDB entry 3r67; Joint Center for Struc-

tural Genomics, unpublished work), and (ii) Tm1225 from

Thermotoga maritima MSB8 (PDB entry 1vkd; Joint Center

for Structural Genomics, unpublished work), which belongs to

the GH130_2 subfamily. No function has yet been attributed

to these five proteins, thus limiting our understanding of their

structure–specificity relationships. Until now, nothing has been

established regarding the molecular bases of the relaxed

specificity of the enzymes classified into the GH130_2 family.

In addition, no structural feature has been identified to explain

the efficiency of Uhgb_MP and Bt1033 in binding and

breaking down N-glycan core oligosaccharides.

research papers

1336 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346

electronic reprint

Page 28: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Here, we present the first crystal structure of an N-glycan

phosphorolytic enzyme, Uhgb_MP, solved by X-ray crystallo-

graphy in complex with inorganic phosphate, mannose and

N-acetylglucosamine. This study made it possible to review the

previously published three-dimensional model of Uhgb_MP

and provides key information to understand its catalytic

mechanism. Comparative analysis of this new tertiary and

quaternary structure with other GH130 structures allowed us

to identify structural features specific to GH130 subfamilies

that could explain their functional specificities and hence their

key role in mannose foraging in the human gut. This work

therefore paves the way for enzyme optimization by rational

engineering to fit industrial needs as well as for the design of

specific inhibitors to investigate, and potentially to control,

interactions between host and gut microbes.

2. Materials and methods

2.1. Recombinant production of Uhgb_MP and enzyme

purification

Uhgb_MP was produced in Escherichia coli BL21-AI cells

(Invitrogen) after its encoding gene had been cloned into the

pET-28a vector, yielding an N-terminally hexahistidine-tagged

protein (detailed procedures are provided as Supporting

Information). After purification by His-tag affinity chroma-

tography and gel filtration, the enzyme was stored in 20 mM

potassium phosphate pH 7.0, 150 mM NaCl (see Supporting

Information).

2.2. Activity measurements

Phosphorolytic activity was assessed using two substrates,

pNP-�-d-mannopyranose and �-d-mannopyranosyl-1,4-d-

mannose. All reactions were carried out with 0.1 mg ml�1

purified enzyme at 37�C (the optimal temperature for

Uhgb_MP) in 20 mM Tris–HCl pH 7.0 (the optimal pH for

Uhgb_MP). For measurement of the activity in the presence of

10 mM inorganic phosphate and 1 mM pNP-�-d-mannopyr-

anose, the pNP release rate was monitored at 405 nm using a

Cary-100 UV–visible spectrophotometer (Agilent Technolo-

gies). The release rate of �-d-mannopyranose-1-phosphate

from 10 mM inorganic phosphate and 10 mM �-d-mannobiose

(Megazyme, Ireland) was measured by quantification of �-d-

mannopyranose-1-phosphate using high-performance anion-

exchange chromatography with pulsed amperometric detec-

tion (HPAEC-PAD) as described previously (Ladeveze et al.,

2013).

2.3. Size­exclusion chromatography multi­angle laser light

scattering (SEC­MALLS) experiments

A 30 ml sample of gel-filtered Uhgb_MP at a concentration

of 6 mg ml�1 in 20 mM potassium phosphate pH 7.0, 150 mM

NaCl was loaded onto a Superdex 200 HR 10/300 column (GE

Healthcare, Massy, France) using an Agilent 1260 Infinity LC

chromatographic system (Agilent Technology) coupled to a

multi-angle laser light scattering (MALLS) detection system.

The protein was centrifuged for 5 min at 4�C at 10 000g before

the sample was loaded. The column was equilibrated with a

0.1 mm filtered buffer composed of 20 mM potassium phos-

phate pH 7.0, 150 mM NaCl. Separation was performed at a

flow rate of 0.4 ml min�1 at 15�C. Data were collected using a

DAWN HELEOS 8+ (eight-angle) light-scattering detector

and an Optilab T-rEX refractive-index detector (Wyatt

Technology, Toulouse, France). The results were analyzed

using the ASTRA v.6.0.2.9 software (Wyatt Technology).

2.4. Protein crystallization

Purified Uhgb_MP protein was concentrated using poly-

ethersulfone Vivaspin concentrators (Vivascience, Sartorius,

Gottingen, Germany). The concentration was determined by

measuring the A280 nm using a NanoDrop instrument

(Wilmington, Delaware, USA). All crystallization experi-

ments were carried out at 12�C by the sitting-drop vapour-

diffusion method using MRC 96-well microplates (Molecular

Dimensions, Newmarket, England) and a Nanodrop ExtY

crystallization instrument (Innovadyne Technologies, Santa

Rosa, USA) to prepare 400 nl droplets. The best Uhgb_MP

crystals were obtained within a week with a 1:1(v:v) ratio of

protein (9–12 mg ml�1 in 20 mM potassium phosphate pH 7.0,

150 mM NaCl supplemented with 5 mM mannose

and/or 5 mM N-acetylglucosamine for the co-crystallization

assays) to precipitant solution [17.5–20%(w/v) polyethylene

glycol 3350, 0.175–0.2M ammonium chloride]. Uhgb_MP

crystals grew to dimensions of 0.2� 0.08� 0.02 mm in a week.

They diffracted to a maximum resolution of 1.80 A, while

those obtained by co-crystallization with mannose, N-acetyl-

glucosamine or both diffracted to maximum resolutions of

1.94, 1.60 and 1.76 A, respectively.

2.5. Data collection and determination of the structure

X-ray experiments were carried out at 100 K. Crystals of

Uhgb_MP were soaked for a few seconds in reservoir solution

supplemented with 15%(v/v) glycerol (apo) or 15%(v/v) PEG

300 (complexes) prior to flash-cooling. Apo Uhgb_MP

diffraction data sets were collected on beamline ID23-1 at the

European Synchrotron Radiation Facility (ESRF), Grenoble,

France, while those for the complexes were collected on the

XALOC beamline at the ALBA Synchrotron, Cerdanyola del

Valles, Spain (Juanhuix et al., 2014). The diffraction intensities

were integrated and scaled using XDS (Kabsch, 2010) and 5%

of the scaled amplitudes were randomly selected and excluded

from the refinement procedure. All crystals belonged to the

orthorhombic space group P212121, with six molecules per

asymmetric unit, giving Matthews coefficients of 2.22 and

2.11 A3 Da�1 and solvent contents of 44 and 42% for the apo

forms and the three complexes, respectively. The structures

were solved by the molecular-replacement method using

Phaser (McCoy et al., 2007) from the CCP4 software suite

(Potterton et al., 2003) and chain A of the crystal structure of

Tm1225 from T. maritimaMSB8 (PDB entry 1vkd) as a search

model for the apo form. The final translation-function Z-score

was 42.8 and the R and Rfree values of the refined structure

were 0.155 and 0.190, respectively. Once solved, the apo

research papers

Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase 1337electronic reprint

Page 29: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

structure was then used to solve the protein–ligand structures.

The structures of Uhgb_MP in complex with mannose, with

N-acetylglucosamine and with mannose and N-acetyl-

glucosamine were refined to final R/Rfree values of 0.150/0.193,

0.154/0.183 and 0.158/0.192, respectively, using REFMAC5

(Murshudov et al., 2011). Models were built manually in

�A-weighted electron-density maps using Coot (Emsley &

Cowtan, 2004). Water molecules were manually checked after

automatic assignment and ligand molecules were manually

fitted in residual maps. Refinement statistics are listed in

Table 1.

2.6. SAXS measurements

Small-angle X-ray scattering (SAXS) experiments were

performed on the SWING beamline at the SOLEIL

synchrotron, Gif-sur-Yvette, France. The wavelength was set

to 1.033 A. A 17 � 17 cm Aviex CCD detector was positioned

1800 mm from the sample, with the direct beam off-centred.

The resulting exploitable q-range was 0.006–0.6 A�1, where q

= 4�sin�/�, considering 2� as the scattering angle. The samples

were circulated in a thermostated quartz capillary with a

diameter of 1.5 mm and 10 mmwall thickness positioned inside

a vacuum chamber. A 80 ml volume of sample was injected

onto a size-exclusion column (Bio SEC3 300, Agilent) equi-

librated in phosphate-based buffer (20 mM potassium phos-

phate pH 7.0, 150 mM NaCl) or Tris-based buffer (20 mM

Tris–HCl pH 7.0, 300 mM NaCl supplemented with 10%

glycerol) using an Agilent high-performance liquid-chroma-

tography (HPLC) system and eluted directly into the SAXS

capillary cell at a flow rate of 200 ml min�1 at a temperature of

10�C. Samples were separated from the pushing liquid (water)

by two air volumes of 6 ml each, as described previously

(David & Perez, 2009). SAXS data were collected online

throughout the elution time and a total of 149 frames, each

lasting 2 s, were recorded separated by a dead time of 0.5 s

between frames. The transmitted intensity was continuously

measured with an accuracy of 0.1% using a diode embedded in

the beam stop. For each sample, the stability of the associated

radius of gyration and the global curve shape in the frames

corresponding to the main elution peak were checked, and the

resulting selection of curves were averaged as described

previously (David & Perez, 2009). The recorded curves were

normalized to the transmitted intensity and subsequently

averaged using Foxtrot, a dedicated in-house application. The

same protocol was applied to buffer scattering. Rg values were

determined by a Guinier fit of the one-dimensional curves

using the ATSAS package (Petoukhov et al., 2007). The P(r)

function was calculated using the GNOM program and the

corresponding ab initio envelopes were calculated using the

GASBOR program. Rigid-body SAXS modelling was

performed using the CORAL program.

research papers

1338 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346

Table 1Data-collection and refinement statistics for Uhgb_MP.

Values in parentheses are for the outer resolution shell.

Native (Pi) Mannose + Pi N-Acetylglucosamine + Pi

Mannose +N-acetylglucosamine + Pi

Data collectionSpace group P212121 P212121 P212121 P212121Unit-cell parameters (A, �) a = 84.1, b = 141.2,

c = 176.2,� = � = � = 90

a = 83.9, b = 140.8,c = 168.7,� = � = � = 90

a = 83.8, b = 140.9,c = 168.6,� = � = � = 90

a = 83.7, b = 140.9,c = 168.8,� = � = � = 90

No. of molecules in asymmetric unit 6 6 6 6Matthews coefficient (A3 Da�1) 2.22 2.11 2.11 2.11Solvent content (%) 44.66 41.82 41.74 41.75Wavelength (A) 0.96863 0.97949 0.97949 0.97949Resolution range (A) 48.16–1.80 (1.91–1.80) 75.11–1.94 (1.98–1.94) 46.68–1.60 (1.64–1.60) 45.43–1.76 (1.80–1.76)No. of unique reflections 190253 (27574) 148300 (21410) 261650 (41727) 197652 (31069)No. of observed reflections 927534 (101770) 1217594 (168817) 2453350 (392431) 1818257 (276166)Completeness (%) 98.00 (88.70) 99.66 (98.85) 99.84 (98.42) 99.64 (95.24)Multiplicity 4.88 (3.69) 8.20 (7.90) 9.37 (7.00) 9.19 (8.88)hI/�(I)i 10.46 (1.63) 9.30 (3.40) 14.49 (2.07) 13.70 (2.54)Rmerge (%) 9.1 (70.0) 15.5 (58.9) 9.3 (103.5) 11.9 (86.5)

RefinementRwork/Rfree 0.157/0.190 0.152/0.193 0.155/0.183 0.158/0.192Root-mean-square deviationsBond lengths (A) 0.0188 0.0181 0.0191 0.0183Bond angles (�) 1.9218 1.8528 1.8905 1.9235

Ramachandran plotFavoured (%) 91.6 91.3 91.3 91.1Allowed (%) 8.1 7.9 8.5 8.6

B factors (A2)Wilson B 24 18 22 22Mean 35 20 22 21Main chain 33 18 20 19Side chain 37 21 24 23Ligand/water 28/37 23/26 26/30 27/26

PDB code 4udi 4udj 4udg 4udk

electronic reprint

Page 30: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

3. Results and discussion

3.1. Conformational stability optimization of Uhgb_MP

Previous work on Uhgb_MP allowed the production and

purification of a recombinant form of the protein in amounts

suitable for crystallization (Ladeveze et al., 2013). However,

owing to enzyme instability, an optimized production system

was set up by subcloning the open reading frame of Uhgb_MP

into the pET-28a vector (Supporting Information xS1).

Subcloning into pET-28a made it possible to produce a

recombinant protein with a thrombin-cleavable N-terminal

hexahistidine tag and a five-residue shortened linker between

the tag and the N-terminal extremity of the native enzyme.

After protein purification and processing in the same buffer as

previously described (Ladeveze et al., 2013), the activity on

pNP-�-d-mannopyranose was increased by 73% to 10.9 �

10�3mmol min�1 mg�1, indicating that the 16-amino-acid

linker used in the initial construct negatively impacted on the

Uhgb_MP activity. To avoid the use of Tween 80, which is not

suitable for crystallization assays, we then screened for an

optimized buffer composition by differential scanning fluori-

metry (DSF; Supporting Information xS1). In-house-prepared

96 deep-well screens adapted from Ericsson et al. (2006) were

used to assess the effect of buffer nature, pH and NaCl

concentration on protein thermal stability. The denaturation

curves revealed two fusion temperatures, Tm1 = 65.7�C and

Tm2 = 70.8�C, in the initial Tris–HCl buffer, indicating that

Uhgb_MP may adopt different oligomeric states in solution.

The DSF results showed unambiguously that phosphate-based

buffers (sodium and potassium phosphate) largely stabilize

the Uhgb_MP structure at all of the assayed pH values (5.0,

5.5, 6.0, 6.5 and 7.0) and NaCl concentrations (136, 159, 287

and 439 mM). The best result was observed using 100 mM

potassium phosphate pH 6.0 with 136 mM NaCl, leading to an

increase in Tm1 and Tm2 of 6.82 � 0.07 and 8.47 � 0.07�C,

respectively. We therefore chose to use potassium phosphate

buffer supplemented with 150 mM NaCl to purify and store

the protein produced using pET-28a::Uhgb_MP. In addition,

the pH was set to 7.0 to allow sufficient separation efficiency

of the protein in the affinity-purification step (Tm1 and Tm2

increased by 4.80� 0.30 and 6.45� 0.30�C, respectively, at this

pH value). Under these optimal conditions, the protein

production yield reached 90 mg pure protein per litre of

culture. Finally, the �-1,4-d-mannobiose phosphorolytic

activity of the enzyme stored in these conditions was increased

tenfold compared with that of enzyme previously expressed in

pDEST17 and purified in Tris buffer (Ladeveze et al., 2013).

3.2. Crystallographic structure of Uhgb_MP subunits

The overall structure of Uhgb_MP was determined by

molecular replacement using the structure of Tm1225 from

T. maritima MSB8 (PDB entry 1vkd), which shares 60%

identity with Uhgb_MP, as a model. The crystal structure of

research papers

Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase 1339

Figure 1Monomeric Uhgb_MP fold, substrates and interacting residues. Likeother GH130 enzymes, the Uhgb_MP monomer has a five-bladed�-propeller fold with a central catalytic furrow. The Pi, mannose andN-acetylglucosamine molecules present at the catalytic site are shown assticks, while interacting residues are shown as lines. The catalytic Asp104is shown in red, Pi-interacting residues in green, mannose-interactingresidues in blue and N-acetylglucosamine-interacting residues in orange.The Asn44 and Asp104 side chains are shown in the B conformation, i.e.the conformation that is catalytically active. Water molecules 436, 457 and656, which mediate interactions between the N-acetylglucosamine andresidues Lys212 and Tyr242, Asp304 and His235, respectively, are shownas black crosses. Protein–Pi interactions: NH2 of Arg150 is contacting PiO4 and O2, while NH2 of Asn151 contacts Pi O

4. !NH2 of the Arg168 sidechain binds to PiO

1 and its !0NH2 is interacting with PiO2. The side-chain

amine of Lys212 binds to Pi O2 and His231 N"2 is at a hydrogen-bond

distance from Pi O1. Finally, the hydroxyl of the Tyr242 side chain is in

contact with Pi O3.

Figure 2Alternative conformations of Asn44, Ser45 and the catalytic Asp104upon mannose binding in the �1 subsite. Superposition of the apostructure of Uhgb_MP (PDB entry 4udi) and Uhgb_MP complexed withmannose (PDB entry 4udj), illustrating the movement of the catalyticresidues when mannose is bound at the �1 subsite. The backbone of theapo form is shown in grey (A conformation), while the backbone of thecomplexed, catalytically active form (B conformation) is in green. Water,phosphate and glycerol molecules in the apo form are shown. 2Fo � Fc

electron-density maps are shown (contoured at 1.0�) for the catalyticresidues in the apo and mannose-bound forms. Interatomic distances arelabelled in A.

electronic reprint

Page 31: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Uhgb_MP revealed a homohexameric organization. The

homohexamer consisted of a trimer of dimers with D3

symmetry, with six molecules per asymmetric unit. The apo

structure was refined to 1.80 A resolution, while the

complexes with mannose, with N-acetylglucosamine and with

the two combined were refined to 1.94, 1.60 and 1.76 A,

respectively (Table 1). The electron-density maps did not

enable construction of the N-terminal extremity of the poly-

peptide chains. The N-terminal hexahistidine tag and the

following 6–8 first residues have thus been omitted from the

final model. The overall fold of each Uhgb_MP protomer

consists of a five-bladed �-propeller (Fig. 1). The catalytic

centre is located in the central cleft as previously hypothesized

(Ladeveze et al., 2013), as a phosphate ion (Pi) and mannose

and N-acetylglucosamine were observed in the central furrow.

Pi is deeply buried in the catalytic site, strongly stabilized by

hydrogen bonds and ionic interactions with the surrounding

residues (Fig. 1). Compared with other glycoside phosphor-

ylases, Pi appeared to be quite strongly bound, since the Pi

dissociation constant values previously determined for the

ternary enzyme–Pi–mannobiose and enzyme–Pi–�-d-Manp-

1,4-�-d-GlcpNAc-1,4-d-GlcpNAc complexes (0.64 and

0.13 mM, respectively; Ladeveze et al., 2013) are more than

200 times lower than that determined for RaMP1 (belonging

to the GH130_1 subfamily), the only other GH130 enzyme for

which a Pi dissociation constant has been determined. In the

apo structure, a molecule of glycerol, which was used as a

cryoprotectant, occupied the �1 subsite. This glycerol mole-

cule was hydrogen-bonded to the side-chain carboxylate

moiety of Asp304 (O1 to O3 and O2 to O1), while its O2 atom

was hydrogen-bonded to the Pi, mimicking the interactions

between mannose and the surrounding amino acids that were

observed in the protein–ligand complexes. Interestingly, in the

Uhgb_MP structures where binding of mannose in the �1

subsite occurred (PDB entries 4udj and 4udk in Table 1), the

mannose ring was found in a stressed B2,5 boat conformation

stabilized by hydrogen bonding to Asp304 (O1 to the C6

hydroxyl and O2 to the C4 hydroxyl); the mutation of this

critical residue to asparagine abolishes 96% of the activity

(Ladeveze et al., 2013). This unusual conformation of mannose

was present in all six chains (Supplementary Fig. S1) and was

previously observed in the BfMGP structures (PDB entry

3was, for which in cristallo activity was observed, and PDB

entry 3wat). It should be noted that this B2,5 boat conforma-

tion is less unstable in mannose compared with other mono-

saccharides owing to the pseudo-equatorial position of the C2–

OH, which is in an anti configuration to the ring O atom, thus

bending the C3–OH towards the �-glycosidic O atom, in a syn

axial position. The binding of mannose in the �1 subsite

induced a large conformational movement in the active site.

Indeed, in the apo form, where a glycerol molecule was

observed in place of mannose, the amino moiety of Asn44

interacts with the C3 hydroxyl of glycerol through a water

molecule (conformation A in Fig. 2). Upon mannose binding

(conformation B), the peptide bond between Phe43 and

Asn44 flips in order to allow direct interaction of the Asn44

side chain with the C3 and C4 hydroxyls of the sugar. The side

chain of the catalytic residue Asp104, mutation of which to

asparagine completely abolished the activity (Ladeveze et al.,

2013), is also moved towards mannose in a position that is

occupied by two water molecules in the apo form. In this B

conformation, Asn44 stabilizes the catalytic Asp104 through

hydrogen bonding, thereby imposing selection of the rotamer

facing the mannose C3 hydroxyl, which acts as a proton relay

during catalysis (Fig. 2). This is the first time that such a

concerted movement upon substrate binding in the �1 subsite

has been reported for a glycoside phosphorylase. It must be

noted that the B conformation of the catalytic Asp104 is

probably the one that is active since it has also been observed

for BfMGP, which was demonstrated to be catalytically active

in cristallo (Nakae et al., 2013). The A conformation of the

catalytic residue Asp104 of Uhgb_MP was observed in the

structure of the GH130_2 Tm1225 protein in the apo form

(PDB entry 1vkd). In the Uhgb_MP structures, the B

conformation was only observed when mannose was bound in

the �1 subsite, and is therefore independent of the presence

of N-acetylglucosamine in the +1 subsite. Indeed, the A

conformation was observed in the complex with N-acetyl-

glucosamine alone (PDB entry 4udg, with glycerol in the �1

subsite and N-acetylglucosamine in the +1 subsite), while the

B conformation was observed in the complex with mannose

and N-acetylglucosamine (PDB entry 4udk, with mannose in

the �1 subsite and N-acetylglucosamine in the +1 subsite). In

theA and B conformations, N-acetylglucosamine bound in the

+1 subsite was found in a 4C1 relaxed chair conformation,

stacked with Tyr103, and bound through hydrogen bonding to

the C6 hydroxyl group, the His174 imidazole ring, the Lys212

side chain and the Tyr242 hydroxyl group via a water mole-

cule. The C3 hydroxyl interacts with the Arg59 side-chain

amine moiety, as well as with Asp304 O1 through a water

molecule. The N-acetyl moiety is also involved in binding

through hydrogen bonding between its NH group and the S

atom of Met67 and between its carbonyl moiety and the

His235 carbonyl via a water molecule. The hydrophobic

methyl moiety faces the side chain of Ala207 and Met67, with

these residues forming a hydrophobic pocket (Fig. 1).

These data enabled us to revise our previous Uhgb_MP

model, which was built using the atomic coordinates of the

Tm1225 protein from T. maritimaMSB8 (PDB entry 1vkd) as

a structural template, considering a monomeric form of the

enzyme and using geometrical constraints provided by a

classical inverting GH-like single-displacement mechanism. In

this model, we previously hypothesized a +1 subsite formed by

the Tyr103, Asp304, His174, Tyr240 and Phe283 residues,

which are specifically conserved in the GH130_2 family, while

the +2 subsite would be delineated by Tyr242, Pro279, Asn280

and Asp304. In this configuration, the exit of the catalytic

tunnel would be orientated towards the inside of the oligo-

meric structure, thereby reducing access to the catalytic site. In

addition, the conserved His235–Tyr240 loop from a cognate

monomer would block the furrow that we have hypothesized.

These new data emphasize the importance of taking into

account the quaternary organization when modelling oligo-

meric enzymes, using SAXS data when possible to define the

research papers

1340 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346

electronic reprint

Page 32: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

low-resolution envelope and avoiding restraining the possible

docking modes of substrates too much to envisage original

reaction mechanisms. Here, thanks to the high-resolution

crystallographic structures of hexameric Uhgb_MP, which is

catalytically active (as detailed in the next section), in complex

with mannose and N-acetylglucosamine, we propose a revised

active-site topology in which the oligosaccharide chain rotates

by 180�, inverting the �1 and +1 subsite positions. This new

orientation allows the substrate to enter from the open side of

the tunnel, and is in complete accordance with the orientation

found in the crystal structure of BfMGP in complex with its

substrates (Nakae et al., 2013).

As previously shown by kinetic analysis of Uhgb_MP,

phosphorolysis of the N-glycan oligosaccharide core follows a

mixed-type sequential random Bi-Bi mechanism (Ladeveze et

al., 2013). However, the order of substrate binding was not

determined. Functional and structural data now lead us to

suggest that the phosphorolytic catalytic mechanism is

composed of a first step in which the inorganic phosphate is

conveyed to the catalytic centre, followed by entry of the

substrate to be phosphorolyzed. Indeed, the Pi binding site is

located deeper in the catalytic site than the glycoside

substrate, meaning that Pi could not bind after the disac-

charide. Thus, mannose binding in the �1 subsite would

induce a flip of the Phe43–Asn44 peptide bond from confor-

mation A to conformation B, thus maintaining the side chain

of Asp104 in a catalytically competent configuration. In

reverse phosphorolysis, entry of mannose-1-phosphate would

be the first step, followed by conformational changes of Phe43,

Asn44, Ser45 and Asp104. Entrance of the acceptor would

lead to the reverse phosphorolysis reaction. Regarding the

phosphorolytic reaction itself, our data show that when the

mannosyl moiety was present in the catalytic site, no water

molecule was located where it could relay the proton from the

catalytic Asp104 to the interosidic O atom. In addition,

comparison between the apo and complexed forms showed

that the change in Asp104 from conformation A to confor-

mation B did not allow the catalytic aspartic acid to be at a

hydrogen-bonding distance from the interosidic O atom,

indicating that this residue is not directly involved in proton

transfer to the interosidic O atom, as seen in the BfMGP

structures. In the latter case, Nakae and coworkers suggested a

catalytic mechanism different from that of known inverting

glycoside phosphorylases (GPs), involving the assistance of

C3—OH to relay proton transfer, because, like us, they did not

observe a water molecule at the catalytic site in any of their

structures (Supplementary Fig. S2). The stressed B2,5 boat

conformation of mannose bound in the �1 subsite and the B

configuration of Asp104 which is only stabilized when

mannose present in the�1 subsite was compatible with such a

mechanism. The first step of phosphorolysis would thus be (i)

nucleophilic attack of Pi on C1 of the mannosyl moiety bound

in the �1 subsite, (ii) a two-step protonation through

Asp104 O2–Man O3–GlcNac O4, with the Asp104 side-chain

carboxylic acid being located at 2.5 A from the C3–OH

(Supplementary Fig. S2). However, even if the catalytic

mechanism of Uhgb_MP and BfMGP appears to be identical,

clear differences in the substrate specificity of GH130

research papers

Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase 1341

Figure 3Uhgb_MP homohexameric structure. (a) Uhgb_MP is a hexameric structure formed by a trimer of dimers. Individual monomers are shown in a singlecolour and are labelled A, B, C, A0, B0 and C0 for clarity. Each pair of dimers is coloured in pale/dark blue, green and red. (b) Close-up of the catalytictunnel of a single Uhgb_MP protomer in the hexameric structure. The quaternary-structure assembly imposes structural constraints on active-siteaccessibility. The inorganic phosphate, the mannose and the N-acetylglucosamine molecules present in the catalytic site of protomer A are shown assticks deeply buried in the catalytic site, which is accessible by a tunnel whose sides are formed by different protomers. The extremity of the tunnel islocated at the centre of the plane formed by four Uhgb_MP molecules, in this case A, A0, B and B0.

electronic reprint

Page 33: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

subfamilies have been identified owing to structural motifs

located in other parts of the protein, as further detailed below.

3.3. Uhgb_MP quaternary structure

In the Uhgb_MP crystal structure, each subunit of the

homohexamer is roughly globular in shape. There is a large

cavity inside the homohexamer and large holes at the centre of

each of the three lateral planes formed by the homohexamer

assembly (Fig. 3a). No discretely bound solvent molecules

were found in these cavities, which are 15 A in diameter at

their narrowest point, indicating that these holes are large

enough to enable substrate access to the active site of each

protomer. The association of the surrounding subunits in the

homohexamer caps the furrow of each protomer, giving rise to

a funnel whose entrance is orientated towards the lateral

aperture (Fig. 3b). The catalytic site is therefore deeply buried,

with the phosphate ion and the �1 subsite located at the

bottom of the tunnel.

Each dimer is formed of a large buried surface area of

1300 A2 involving two Uhgb_MP molecules linked by twofold

symmetry (Fig. 4a). The interactions promoting dimer asso-

ciation involve the side chains of the residues at the interface,

such as His123 and His196 stacking, and several hydrogen

bonds or salt bridges involving side-chain atoms, such as

between Arg195 and Gln142, Glu142 and Arg193, Asp93 and

Arg195, and Glu189 and Arg193. Other polar interactions

involve main-chain atoms of Ala144 and His194, Tyr122 and

His196, and Trp191 and a symmetry mate. The homohexamer

is formed by the association of three dimers arranged around a

pseudo-threefold axis. Each dimer is related to its neighbours

through symmetrical interactions involving each of the two

protomers (covering an interaction surface of 840 A2 each;

Fig. 4b). These interactions involve T-shaped stacking between

the imidazole groups of His174 and His235 and hydrogen-

bond interactions between side chains between Asn40 and

Tyr264 and between Ser64 and Pro263. Main-chain carbonyl

and amino groups are also involved in the assembly. More

precisely, the Thr63 carbonyl is in contact with Tyr264 NH,

while the side chains of Asn238 and Asn280 interact with the

carbonyl group of Gly276 and the amide N atom of Tyr240,

respectively. Finally, the side chain of Asn238 makes contact

with the Pro279 carbonyl moiety.

SEC-MALLS and SAXS analysis confirmed the hexameric

organization of Uhgb_MP in solution. The protein apparent

molecular mass determined by SEC-MALLS was 240 kDa

(n = 6.16; Supplementary Fig. S3). Guinier analysis of the

SAXS data revealed that in phosphate and Tris–glycerol

buffers, the radius of gyration (Rg) was considerably larger

than the theoretical Rg, indicating protein aggregation. Based

on data collected in Tris–glycerol buffer in the presence of

1 mM TCEP as reducing agent, an Rg value of 37.9 � 0.08 A

was obtained, which is in good agreement with the theoretical

Rg calculated from the apo crystal structure (�37 A). The pair

distribution function P(r) revealed a compact particle with a

Dmax of �110 A that closely matches the largest dimension of

research papers

1342 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346

Figure 4Interaction surfaces between the different Uhgb_MP protomers. (a) The interaction surfaces between two Uhgb_MP protomers involved in dimerformation. (b) The interaction surfaces between Uhgb_MP dimers to form the hexamer. Only the interactions involving the three upper monomers fromeach dimer are shown here, in order to clarify the view, as these surfaces are symmetrical in the lower monomers.

electronic reprint

Page 34: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

the hexamer (Supplementary Fig. S4). The ab initio envelope

confirmed the compact shape of Uhgb_MP in solution with a

trimer-of-dimers organization superimposable with the crys-

tallographic structure. SAXS-based rigid-body modelling was

attempted by taking into account the flexibility of the

N-terminal residues missing from the crystal structure. The fit

showed that the theoretical curve closely matched the

experimental data, thus confirming the general hexameric

organization of Uhgb_MP in solution (Supplementary Fig.

S4).

Considering that (i) we never observed the existence of

Uhgb_MP monomers, either in solution or in crystals, (ii) the

quaternary structures deduced from both SAXS and crystallo-

graphic data are superimposable and (iii) the intracellular Pi

concentration in bacteria is around 10 mM (Motomura et al.,

2011), meaning that Pi binds to Uhgb_MP in vivo as in the

crystals, we conclude that hexamerization is required for

enzyme activity and that the crystal structure presented here is

certainly the most probable organization under physiological

conditions.

All other known GPs belonging to the GH3, GH13, GH65,

GH94, GH112, GT4 and GT35 families crystallize and act as

homodimers. In contrast, it is difficult to find general features

that control the oligomerization of GH130 enzymes, even for

those belonging to the same subfamily. Indeed, all of the data

that we gathered on functionally or structurally characterized

GH130 enzymes showed that no single subfamily contained

homogenous oligomerization profiles. Enzymes belonging to

the GH130_NC group (the BACOVA_03624, BACOVA_

02161, BT4094 and BDI_3141 proteins) crystallized as

monomers, while the two functionally characterized proteins

Teth514_1788 and Teth514_01789 have been shown to be

dimeric and monomeric in solution, respectively (Chiku et al.,

2014). Various oligomeric forms (in solution or crystals) have

been found in the GH130_1 (hexameric BfMGP, pentameric

RmMGP and dimeric RaMP1) and GH130_2 (hexameric

Uhgb_MP, dimeric Tm1225, hexameric RaMP2 and tetrameric

Bt1033) subfamilies.

Moreover, the presence of the AxxxAxxxA motif in the

BfMGP N-terminal helix, which was thought to mediate

oligomerization (Nakae et al., 2013), was not found in RaMP1

or RmMGP, demonstrating that this particular motif is not the

only element that is able to promote the formation of oligo-

mers in enzymes belonging to the same GH130 subfamily.

Finally, no particular secondary-structure element appeared to

mediate interactions between the different Uhgb_MP proto-

mers, which were only associated by surface interactions,

without any involvement of secondary-structure elements, in

contrast to what was observed for BfMGP (the only GP of

known structure with a similar homohexameric conforma-

tion). Indeed, the BfMGP loop Thr42–Met68, which was

found in contact with the N-terminal tab helix and is thought

to help contact the cognate protomer, is completely absent in

Uhgb_MP, and more generally in all GH130_2 sequences.

Taken together, these data highlight the structural originality

of GH130 enzymes among glycoside phosphorylases. Never-

theless, the solutions of many other GH130 structures will be

required to be able to highlight possible structural markers of

oligomerization.

3.4. Molecular bases of specificity towards mannosides

Uhgb_MP is the first member of the GH130_2 subfamily to

be characterized both functionally and structurally. Tm1225

has only been structurally characterized, and no structural

data is available for the functionally characterized GH130_2

members RaMP2 and Bt1033. In contrast to the known

enzymes classified into the GH130_1 subfamily (including

BfMGP, the only crystallized member of this subfamily;

Senoura et al., 2011; Nakae et al., 2013), which exhibit a narrow

research papers

Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase 1343

Figure 5Sequence alignment of characterized GH130 enzymes. This alignment highlights the conservation of the catalytic and substrate-interacting residuesamong characterized GH130 enzymes, as well as family-specific loops.

electronic reprint

Page 35: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

specificity towards �-d-Manp-1,4-d-Glc, GH130_2 enzymes

present a highly relaxed substrate specificity. Moreover, the

Uhgb_MP structure is the first structure of an inverting

glycoside phosphorylase that is active on a polysaccharide.

Indeed, the only enzymes of known structure that are able to

phosphorolyze polysaccharides are the retaining �-maltosyl

phosphate:�-1,4-d-glucan-4-�-d-maltosyltransferases belonging

to GH13 and glycogen or starch phosphorylases belonging to

the GT35 family (Egloff et al., 2001; Mirza et al., 2006).

Uhgb_MP structures were compared with the six structures

available for GH130 enzymes in order to identify any struc-

tural features that could explain the enzyme specificity, in

particular towards N-glycan oligosaccharides, long manno-

oligosaccharides and mannans. The residues involved in the

catalytic machinery and in substrate binding are highly

conserved in both the sequences and the three-dimensional

structures, with the notable exception of the side chains of

Asp104, Asn44 and Ser45 in the A configuration (Fig. 5 and

Fig. 6), with overall r.m.s.d. values after C� superposition of

2.1, 2.1, 2.1, 2.0, 2.0 and 1.0 A for BfMGP (25% identity with

Uhgb_MP), BACOVA_03624 (23% identity), BACOVA_

02161 (22% identity), Bt4094 (23% identity), BDI_3141 (25%

identity) and Tm1225 (61% identity), respectively (Fig. 6). In

addition, the Pi and glycosyl moieties in the�1 and +1 subsites

superimposed perfectly with those present in the structure of

BfMGP in complex with inorganic phosphate, mannose and

glucose. The electron density of mannose was separated from

that of N-acetylglucosamine, in contrast to what would have

been observed for the disaccharide �-d-Manp-1,4-d-GlcpNAc,

because of the impossible superimposition of mannose O1 and

N-acetylglucosamine O4. However, some structural features

that are conserved in the subfamily explain the differences in

substrate specificities observed between subfamilies (Fig. 6).

The most significant structural changes were identified in

the Uhgb_MP Gly121–Pro125 loop, which is 11 residues

longer in GH130_1 enzymes compared with those belonging

to the GH130_2 and GH130_NC clus-

ters (Fig. 6). This longer loop appeared

at the extremity of the catalytic tunnel

and, in the case of BfMGP, actually

filled it. Therefore, the accommodation

of longer substrates than disaccharide

would be impossible for GH130_1

enzymes, which is in accordance with

the biochemical data published to date

(Senoura et al., 2011; Kawahara et al.,

2012; Jaito et al., 2014). In GH130_2

enzymes the shorter loop would enable

the entry of longer substrates, such as

long manno-oligosaccharides or even

mannans for Uhgb_MP, into the large

cavity formed inside the quaternary

structure between the three lateral

planes of the homohexamer. Moreover,

the +1 subsite flexibility of the GH130_2

members, which are able to accom-

modate N-acetylglucosamine and the C2

epimer of glucose, would be explained

by the location of the Uhgb_MP Arg65

residue. Indeed, the arginine side chain

is at a distance of 6.43 A from the O2

atom compared with 2.96 A for the side

chain of the corresponding BfMGP

residue, Arg94, which would be

responsible for the specificity of the

GH130_1 members for �-d-Manp-1,4-d-

Glc through hydrogen bonding to O2 of

glucose.

In addition, the stronger specificity of

GH130_2 towards N-acetylglucosamine

at the +1 subsite compared with glucose

or mannose is explained by hydro-

phobic interactions of the N-acetyl-

methyl moiety with Met67, which is not

conserved in the GH130_1 subfamily,

research papers

1344 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346

Figure 6Superposition of GH130 structures. Superposition of Uhgb_MP (GH130_2; red) with mannosyl-glucose phosphorylase (BfMGP) from B. fragilis NCTC 9343 (GH130_1; PDB entry 4kmi; blue),and BACOVA_03624 from Bacteroides ovatus ATCC 8483 (GH130_NC; PDB entry 3qc2; green),illustrating the structural differences between GH130 subfamilies. The catalytic Asp104 is shown inthe B conformation. With the exception of the BfMGP loop Thr42–Met68, which does not exist inthe Uhgb_MP structure, the loops are numbered with respect to the Uhgb_MP sequence. The11-residue longer Gly121–Pro125 loop, which is specific to GH130_1, fills the entrance to theUhgb_MP tunnel. In place of the Asp61–Arg65 loop, an extension is observed for GH130_NC,capping the catalytic site. These two loops, which are specific to GH130_1 and GH130_NC,respectively, may explain the inability of enzymes belonging to these subfamilies to phosphorolyzelong substrates. Loop Asp171–Phe177 (the so-called ‘lid loop’ in BfMGP) is shorter in GH130_NCenzymes than in GH130_1 and GH130_2, thus allowing access to the active site. In addition, inGH130_1 enzymes loop Asp171–Phe177 is very mobile because of the GSGGG motif located at itsbase, which is locked close to the catalytic site only when a substrate is bound, as shown for BfMGPstructures. In contrast, in Uhgb_MP the loop is not so mobile and holds His174, which is conservedin GH130_2 and which has already been shown to be involved in the +1 subsite. The BfMGP loopThr42–Met68 in contact with the N-terminal helix involved in oligomerization is completely absentin Uhgb_MP even when both proteins are assembled as homohexamers.

electronic reprint

Page 36: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

while being present in half of the GH130_2 members, espe-

cially those acting on �-d-Manp-1,4-d-GlcNAc (Bt1033 and

RaMP2).

Moreover, in the structures containing N-acetyl-

glucosamine, the Phe203 side chain of a cognate monomer was

found close to Met67; these two residues form a hydrophobic

pocket that interacts with the methyl group of the N-acetyl

moiety. On the contrary, in the apo form and in the structure

containing mannose alone in the �1 subsite, the Phe203 side

chain was rather found rotated towards the exit of the catalytic

tunnel. Therefore, Phe203, which is not conserved in the

GH130_1 subfamily, while being present in 25% of GH130_2

members, would thus be a specific feature of GH130 enzymes

that are able to phosphorolyze �-d-Manp-1,4-�-d-GlcpNAc.

The C3 stereochemistry at the +1 subsite also appears to be

critical since all pyranoside inhibitors of Uhgb_MP (allose,

l-rhamnose and altrose) share an inversion of configuration at

this position compared with that of mannose. This effect is

probably owing to the proximity of Tyr103 (or the equivalent

Tyr130 in BfMGP), thus implying a steric constraint that

would select an equatorial hydroxyl at this position in the case

of a 4C1 chair, which is the case for N-acetylglucosamine in

the structure of the corresponding complex. We previously

observed that a Y103E mutation strongly destabilizes

Uhgb_MP, as is the case for the wild-type enzyme without

phosphate. The Y103E mutation also significantly increases

the ratio between hydrolysis and phosphorolysis, with the

glutamic acid playing the role of the second catalytic acid

required for hydrolysis (Ladeveze et al., 2013). The role of

Tyr103 in stabilizing the active-site conformation in the

presence of phosphate is owing to hydrogen-bonding inter-

actions between its lateral chain and that of Arg150, which

interacts with phosphate (Fig. 1). The Y103E mutation would

decrease hydrogen-bonding interactions, while positioning the

glutamic acid at a hydrogen-bonding distance (less than 4 A)

from the interosidic O atom to allow hydrolysis to occur.

3.5. Significance

In this paper, we present the first structure of a phosphor-

olytic enzyme involved in N-glycan degradation in its apo

form and in complex with mannose and N-acetylglucosamine.

As previously highlighted by the integration of biochemical,

genomic and metagenomic data, Uhgb_MP and GH130

enzymes more generally can be considered as interesting

targets to study interactions between host and gut microbes,

especially since GH130_2 sequences are overrepresented in

the metagenomes of IBD patients. Further studies will be

needed to confirm the physiological role of these enzymes and

their potential involvement in damage to the intestinal barrier,

such as metabolomic and transcriptomic analyses of the gut

bacteria that produce them in the presence of N-glycans as a

carbon source or inoculated in model animals with and

without inhibitors. This three-dimensional structure paves the

way for such studies through the design of specific GH130_2

inhibitors that could mimic substrate binding.

In addition, analysis of tertiary and quaternary structures

led to the identification of structural features involved in the

accommodation of long oligosaccharides and polysaccharides.

This is a key feature that is unique to Uhgb_MP and is of great

biotechnological interest for the conversion of hemicellulose

into compounds with high added value. Identification of the

structural determinants of the strong specificity of Uhgb_MP

and other GH130_2 enzymes towards Man-GlcNAc also paves

the way for the rational engineering of GH130 enzymes

optimized for manno-oligosaccharide synthesis and diversifi-

cation. Functional investigations of structurally characterized

enzymes classified in the different GH130 sequence clusters

would significantly advance our knowledge of the molecular

bases of substrate specificities and improve our understanding

of their role in key catabolic pathways, especially in the

mammalian gut.

4. Related literature

The following references are cited in the Supporting Infor-

mation for this article: Studier (2005).

Acknowledgements

This work was supported by the French Ministry of Higher

Education and Research and by the French National Institute

for Agricultural Research (INRA, ‘Meta-omics of Microbial

Ecosystems’ research program). The equipment used for

protein purification (ICEO facility), biophysical (DSF, SEC-

MALLS) and crystallographic experiments are part of the

Integrated Screening Platform of Toulouse (PICT, IBiSA). We

thank Dr Valerie Guillet for technical assistance with SEC-

MALLS. We also thank the European Synchrotron Radiation

Facility (ESRF), Grenoble, France, in particular the staff of

beamline ID-23-1. Experiments were also performed on the

XALOC beamline at the ALBA Synchrotron (Barcelona,

Spain) with the collaboration of the ALBA staff (Dr Jordi

Juanhuix). Author contributions: Uhgb_MP production and

purification, SL; crystallographic studies and X-ray data

collection, SL, ST, GC and LM; SAXS experiments, PR, SL

and GC; DSF and SEC-MALLS experiments, SL and ST.

Experiments were designed by SL, ST and GPV. The manu-

script was written primarily by SL and GPV with contributions

from ST, GC and PR. SL, GC and PR prepared the figures.

The research leading to these results has received funding

from the European Community’s Seventh Framework

Programme (FP7/2007-2013) under BioStruct-X (grant

agreement No. 283570).

References

Aebi, M., Bernasconi, R., Clerc, S. & Molinari, M. (2010). TrendsBiochem. Sci. 35, 74–82.

Aspeborg, H., Coutinho, P. M., Wang, Y., Brumer, H. & Henrissat, B.(2012). BMC Evol. Biol. 12, 186.

Burnaugh, A. M., Frantz, L. J. & King, S. J. (2008). J. Bacteriol. 190,221–230.

Chiku, K., Nihira, T., Suzuki, E., Nishimoto, M., Kitaoka, M.,Ohtsubo, K. & Nakai, H. (2014). PLoS One, 9, e114882.

David, G. & Perez, J. (2009). J. Appl. Cryst. 42, 892–900.

research papers

Acta Cryst. (2015). D71, 1335–1346 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase 1345electronic reprint

Page 37: Laboratoire d’Ingénierie des Systèmes Biologiques et des ... 2020/Toulouse/Toulouse... · enzymes, which constitute the catalytic part of these multi-proteic systems, are relatively

Egloff, M. P., Uppenberg, J., Haalck, L. & van Tilbeurgh, H. (2001).Structure, 9, 689–697.

Emsley, P. & Cowtan, K. (2004). Acta Cryst. D60, 2126–2132.Ericsson, U. B., Hallberg, B. M., DeTitta, G. T., Dekker, N. &Nordlund, P. (2006). Anal. Biochem. 357, 289–298.

Jaito, N., Saburi, W., Odaka, R., Kido, Y., Hamura, K., Nishimoto, M.,Kitaoka, M., Matsui, H. & Mori, H. (2014). Biosci. Biotechnol.Biochem. 78, 263–270.

Juanhuix, J., Gil-Ortiz, F., Cunı, G., Colldelram, C., Nicolas, J., Lidon,J., Boter, E., Ruget, C., Ferrer, S. & Benach, J. (2014). J.

Synchrotron Radiat. 21, 679–689.Kabsch, W. (2010). Acta Cryst. D66, 125–132.Kawahara, R., Saburi, W., Odaka, R., Taguchi, H., Ito, S., Mori, H. &Matsui, H. (2012). J. Biol. Chem. 287, 42389–42399.

Ladeveze, S., Tarquis, L., Cecchini, D. A., Bercovici, J., Andre, I.,Topham, C. M., Morel, S., Laville, E., Monsan, P., Lombard, V.,Henrissat, B. & Potocki-Veronese, G. (2013). J. Biol. Chem. 288,32370–32383.

Ladeveze, S., Tarquis, L., Henrissat, B., Monsan, P., Laville, E. &Potocki-Veronese, G. (2014). International Patent WO/2015/014973.

Larkin, A. & Imperiali, B. (2011). Biochemistry, 50, 4411–4426.Lehle, L., Strahl, S. & Tanner, W. (2006). Angew. Chem. Int. Ed. 45,6802–6818.

Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. &Henrissat, B. (2014). Nucleic Acids Res. 42, D490–D495.

Martens, E. C., Chiang, H. C. & Gordon, J. I. (2008). Cell Host

Microbe, 4, 447–457.Martens, E. C., Koropatkin, N. M., Smith, T. J. & Gordon, J. I. (2009).J. Biol. Chem. 284, 24673–24677.

McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D.,Storoni, L. C. & Read, R. J. (2007). J. Appl. Cryst. 40, 658–674.

Mirza, O., Skov, L. K., Sprogøe, D., van den Broek, L. A. M.,Beldman, G., Kastrup, J. S. & Gajhede, M. (2006). J. Biol. Chem.

281, 35576–35584.

Motomura, K., Hirota, R., Ohnaka, N., Okada, M., Ikeda, T.,Morohoshi, T., Ohtake, H. & Kuroda, A. (2011). FEMS Microbiol.

Lett. 320, 25–32.Murshudov, G. N., Skubak, P., Lebedev, A. A., Pannu, N. S., Steiner,R. A., Nicholls, R. A., Winn, M. D., Long, F. & Vagin, A. A. (2011).Acta Cryst. D67, 355–367.

Nagae, M. & Yamaguchi, Y. (2012). Int. J. Mol. Sci. 13, 8398–8429.Nakae, S., Ito, S., Higa, M., Senoura, T., Wasaki, J., Hijikata, A.,Shionyu, M., Ito, S. & Shirai, T. (2013). J. Mol. Biol. 425, 4468–4478.

Nihira, T., Suzuki, E., Kitaoka, M., Nishimoto, M., Ohtsubo, K. &Nakai, H. (2013). J. Biol. Chem. 288, 27366–27374.

Petoukhov, M. V., Konarev, P. V., Kikhney, A. G. & Svergun, D. I.(2007). J. Appl. Cryst. 40, s223–s228.

Potterton, E., Briggs, P., Turkenburg, M. & Dodson, E. (2003). ActaCryst. D59, 1131–1137.

Renzi, F., Manfredi, P., Mally, M., Moes, S., Jeno, P. & Cornelis, G. R.(2011). PLoS Pathog. 7, e1002118.

Roberts, G., Tarelli, E., Homer, K. A., Philpott-Howard, J. &Beighton, D. (2000). J. Bacteriol. 182, 882–890.

Senoura, T., Ito, S., Taguchi, H., Higa, M., Hamada, S., Matsui, H.,Ozawa, T., Jin, S., Watanabe, J., Wasaki, J. & Ito, S. (2011). Biochem.

Biophys. Res. Commun. 408, 701–706.Sheng, Y. H., Hasnain, S. Z., Florin, T. H. J. & McGuckin, M. A.(2012). J. Gastroenterol. Hepatol. 27, 28–38.

Studier, F. W. (2005). Protein Expr. Purif. 41, 207–234.Suzuki, T. & Harada, Y. (2014). Biochem. Biophys. Res. Commun.

453, 213–219.Varki, A., Cummings, R. D., Esko, J. D., Freeze, H. H., Stanley, P.,Bertozzi, C. R., Hart, G. W. & Etzler, M. E. (2009). Editors.Essentials of Glycobiology, 2nd ed. New York: Cold Spring HarborLaboratory Press.

Zhu, Y., Suits, M. D. L., Thompson, A. J., Chavan, S., Dinev, Z.,Dumon, C., Smith, N., Moremen, K. W., Xiang, Y., Siriwardena, A.,Williams, S. J., Gilbert, H. J. & Davies, G. J. (2010). Nature Chem.

Biol. 6, 125–132.

research papers

1346 Ladeveze et al. � N­Glycan processing by mannoside phosphorylase Acta Cryst. (2015). D71, 1335–1346

electronic reprint