Download - Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

Transcript
Page 1: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

1

ModelinganddockingantibodystructureswithRosettaBrianD.Weitzner,1*JeliazkoR.Jeliazkov,2*SergeyLyskov,1*NicholasMarze,1DaisukeKuroda,1,3RahelFrick,4NaireetaBiswas,1andJeffreyJ.Gray1,5,6,7

1DepartmentofChemicalandBiomolecularEngineering,JohnsHopkinsUniversity,Baltimore,Maryland,USA.2T.C.JenkinsDepartmentofBiophysics,JohnsHopkinsUniversity,Baltimore,Maryland,USA.3DepartmentofAnalyticalandPhysicalChemistry,ShowaUniversitySchoolofPharmacy,Tokyo142-8555,Japan.4CentreforImmuneRegulation,DepartmentofBiosciences,UniversityofOslo,Oslo,Norway;CentreforImmuneRegulation,DepartmentofImmunology,OsloUniversityHostpitalRikshospitalet,UniversityofOslo,Oslo,Norway.5PrograminMolecularBiophysics,JohnsHopkinsUniversity,Baltimore,Maryland,USA.6InstituteforNanoBioTechnology,JohnsHopkinsUniversity,Baltimore,Maryland,USA.7SidneyKimmelComprehensiveCancerCenter,JohnsHopkinsSchoolofMedicine,Baltimore,Maryland,USA.CorrespondenceshouldbeaddressedtoJ.J.G.([email protected]).*Equalcontributionauthors

ABSTRACT

WedescribeRosetta-basedcomputationalprotocolsforpredictingthethree-dimensionalstructureofanantibodyfromsequenceandthendockingtheantibody–antigencomplexes.Antibodymodelingleveragescanonicalloopconformationstograftlargesegmentsfromexperimentally-determinedstructuresaswellas(1)energeticcalculationstominimizeloops,(2)dockingmethodologytorefinetheVL–VHrelativeorientation,and(3)denovopredictionoftheelusiveCDRH3loop.Toalleviatemodeluncertainty,antibody–antigendockingresamplesCDRloopconformationsandcanusemultiplemodelstorepresentanensembleofconformationsfortheantibody,theantigenorboth.Theseprotocolscanberunfully-automatedviatheROSIEwebserverormanuallyonacomputerwithusercontrolofindividualsteps.Forbestresults,theprotocolrequiresroughly2,500CPU-hoursforantibodymodelingand250CPU-hoursforantibody–antigendocking.Bothtaskscanbecompletedinunderadaybyusingpublicsupercomputers.

INTRODUCTION

ThevertebrateadaptiveimmunesystemiscapableofpromotingcellstodegranulateorphagocytosenearlyanyforeignpathogenbyproducingimmunoglobulinG(IgG)proteins(antibodies)thatrecognizeaspecificregion(epitope)ofapathogenicmolecule(antigen).Theabilitytobinddiverseantigensrequiresadiversepopulationofantibodies,whichisachievedthroughcomplexprocessesinbonemarrowandlymphatictissues,namelyV(D)Jrecombinationandsomatichypermutation.Thediversityofantibodiesisastonishing;thesizeofthetheoreticalnaïveantibodyrepertoireisestimatedtobe>1013inhumans1.Inadditiontotheirbiologicalimportance,antibodiesareroutinelyusedinbiotechnologyasprobesanddiagnostics,andtherearedozensofantibodiesapprovedastherapeutics2.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 2: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

2

Next-generationsequencingtechniqueshaveenabledrapiddeterminationoflargenumbersofantibodysequences1.Alimitationoftheseapproachesisthatnoinformationaboutthespecificatomiccontactsbetweentheantibodyandantigencanbegleanedfromthesedatasets.Atomicdetailisrequiredtoconsiderspecificantibody–antigeninteractions,forexample,inordertodeveloptherapeuticantibodiesorvaccinesthataremimeticsofextremelyinfectiousantigens3.Althoughthereareexperimentalmethodscapableofgeneratingstructuralmodelsinatomicdetail(X-raycrystallography,NMR,neutrondiffraction,cryo-EM),notallproteinstructurescanbedeterminedwiththesemethods,andlimitedresourcesmakeitimpossibletodeterminethestructuresofallofthesequencesidentifiedinhigh-throughputsequencingexperiments.Tobridgethesequence–structuregap,onemustemploycomputationalstructurepredictionmethods.Perhapsmoreimportantly,structurepredictionmethodsareusefulindiagnosticsanddrugdiscoverytodefineepitopesandhelpinferbiologicalortherapeuticmechanisms.

Thefunctionofanantibodyarisesfromitsthree-dimensionalstructure.TheIgGisoform,themostcommontypeofnaturallyoccurringantibodies,consistsoftwoidenticalsetsofheavyandlightchainsarrangedintoa“Y”shape,withthefourpolypeptidechainsjoinedbydisulfidelinkages.Theheavychaincontainsfourdomains,threeadjacentconstantdomains(CH1,CH2,CH3)andonevariabledomain(VH),andthelightchainconsistsofasingleconstantdomain(CL)andavariabledomain(VL).TheCH1andVHdomainsinteractwiththeCLandVLdomainstoformtheantigen-bindingfragment(Fab)orthe“arms”oftheY.WithintheFab,bothvariabledomainsaredirectedawayfromtheremainingheavychainconstantdomainsandmakeupthevariablefragment(FV).AtthetipoftheFVarethreecomplementaritydeterminingregion(CDR)loopsoneachchain(CDRL1–3andCDRH1–3)thatformtheregionoftheantibody,calledtheparatope,thatrecognizesitstarget.ThisFvstructureiscommontootherantibodyisoforms(IgA,IgE,etc.).

Antibodyhomologymodeling

TheFVisthefocalpointoftherecombinationandhypermutationevents;assuch,theprimarydifferenceamongantibodiesistheconformation,structuralcontext,andchemicalidentityoftheirCDRloops.Forthisreason,antibodystructurepredictionmethodsfocusonmodelingtheFV.TheFVcanbesplitintotworegions:frameworkregions,andCDRloops.Theframeworkregionshaveahighdegreeofstructuralconservation,makingitpossibletogenerateaccuratemodelsofframeworkregionsfromtemplatestructures.

Similarly,analysisofantibodycrystalstructureshasrevealedthatfiveofthesixCDRloops(CDRL1–3,H1,H2)adoptalimitednumberofdistinctstructures,referredtoascanonicalloopconformations4.ThecanonicalconformationofaparticularCDRloopcantypicallybeidentifiedfromitslengthandsequence.Liketheframeworkregions,theCDRsL1–3,H1,andH2arealsomodeledusingtemplatestructures.

TheremainingCDRloop,H3,doesnotadoptcanonicalconformationsandmustbemodeleddenovo.Additionally,theH3loopliesattheinterfaceofthetwodomains(VHandVL)andcaninteractwithresiduesoneitherchain.Toaccountfortheseinteractionsaswellastheoverallgeometryoftheparatope,theVL–VHorientationisoptimizedduringH3modeling.Accurately

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 3: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

3

modelingCDRH3andtheVL–VHorientationaretypicallythemostchallengingaspectsofantibodystructureprediction.

Protein–proteindocking

Whileaccuratepredictionsofunboundantibodystructuresareinformative,theyarevoidofanimportantbiologicalcontext:theantibody–antigen(Ab–Ag)interaction.High-resolutionstructuresofAb–Agcomplexesgiveinsighttothemolecularmechanismbywhichantibodiesfunction,anecessityforrationaldesignofvaccinesorantibodytherapeutics.StructuresofAb–Agcomplexescanbedeterminedthroughexperimentalmethods,however,justaswithunboundantibodies,thesemethodsarelimitedbytheirthroughputandexpenseandarenotviableforallproteins.Whenexperimentalmethodscannotbeusedtodeterminecomplexstructures,computationalprotein–proteininterfaceprediction(docking)providesanalternativeapproach.

Ingeneral,computationaldockingapproachesstrivetosampleallpossibleinteractionsbetweentwoproteinstodiscernthebiologically-relevantinteraction.Predictingaprotein–proteininteractiondenovoischallengingduetothesheernumberofpossibledockedconformations.However,thesamplespacecanbemadetractablewithinformationabouttheinteraction.InthecaseofAb–Aginteractions,thesearchspaceislimitedbecausetheantibodyparatope,comprisedofthesixCDRloops,isthebindingsiteforthecognateantigenepitope.

TheRosettaSnugDockalgorithmleveragestheinformationabouttheflexibleand/oruncertainregionsoftheantibodytoperformrobustAb–Agdocking5.SnugDocksimulatestheinduced-fitmechanismthroughsimultaneousoptimizationofseveraldegreesoffreedom.Itperformsrigid-bodydockingofthemulti-body(VL–VH)–Agcomplex,aswellasre-modelingoftheCDRH2andH3loops,thelatterofwhichtypicallycontributesapluralityofatomiccontactstotheAb–Aginteraction.SnugDockcanalsosimulateconformerselectionbyswappingeithertheantibodyortheantigenwithanothermemberofapre-generatedstructuralensemble.BecauseSnugDocksamplesmostoftheconformationspaceavailabletoantibodyparatopes,itcanrefineantibodyhomologymodelswithinaccuraciesinthedifficult-to-predictVL–VHorientationandCDRH3loop.

Whendockinghomologymodels,itisbestifthereisexperimentalevidencetosuggestthegenerallocationoftheepitope(within~10Å,approximatelythecorrectsideoftheantigendomain),andinthisprotocolpaper,wedescribethelocaldockingprocedureindetail.Ifnoinformationisavailableabouttheepitope,thereareseveralprogramsthatperformglobaldockingorepitopeprediction6.Inparticular,therearetwofast-Fouriertransform(FFT)rigid-bodydockingapproachesthatimplementantibody-specificenergypotentials:PIPER7withtheantibody-ADARSpotential8,andZDOCK9withtheAntibodyi-Patchpotential10.FFTrigid-bodyapproachesarefast,buttheycannotaccountforantibodymotionsuponantigenbindingorcompensateforerrorsintheinitialhomologymodel;SnugDockistheonlyflexible-backboneantibodydockingmethod.Itcanprovideaglobal-antigendockingalternativebutitisslowerand,likeothers,canproducefalse-positiveepitopepredictions5.Forlocaldocking,SnugDockhasbeendemonstratedtoproducehigh-qualitymodelswhenusinganantibodyhomology

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 4: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

4

modelorcrystalstructureandtheunboundantigencrystalstructureasinput5.Inaddition,SnugDockapproachesusedintheCAPRIblinddockingchallenge11producedthebeststructureamongallpredictorsforaflexible-looptarget.CAPRIusesstar-basedrankings(***=highquality,**=medium,*=acceptable,0=incorrect)12.ExaminingthehighestattainedCAPRIqualityamongthetenlowest-scoringdockedmodels(startingwithahomologymodeledantibody),SnugDockcurrentlyproduces1***,10**,and4*modelsinatestsetof15antibody-antigentargets(Table1).TheseperformancedataareimprovedsincetheoriginalSnugDockpublication5duetoupdatesintheenergyfunction13andaswitchtothekinematicloopclosure(KIC)loopmodelingmethod.14–16

Protocoloverview

Theprotocoldescribedinthispaperenablesausertogenerateastructuralmodelofanantibodyfromitssequenceandastructuralmodelofanantibody–antigencomplexfromstructuresoftheantibodyanditsantigen(Fig.1).

Protocoloverview:Antibodyhomologymodeling(steps1–8)

GeneratingastructuralmodelofanantibodyfromsequenceinRosettaAntibodyuseshomologymodelingtechniques,thatis,itusessegmentsfromknownstructureswithsimilarsequences.Asdescribedindetailbelow,theinputsequenceissplitintoseveralcomponents.Foreachcomponent,RosettaAntibodysearchesacurateddatabaseofknownstructuresfortheclosestmatchbysequenceandthenassemblesthosestructuralsegmentsintoamodel.ThatmodelisthenusedastheinputforthenextstageinwhichtheCDRH3loopismodeledandtheVL–VHorientationisoptimized.

Numberingtheresiduesinthesequence

TheRosettaAntibodyprotocolidentifiestheCDRsoftheinputantibodysequencethroughregularexpressionmatchingtotheKabatCDRdefinition17,anditnumberstheantibodyresiduesaccordingtotheChothiascheme4.

Templateselection

Foreachstructuralcomponentconsidered(FRL,FRH,CDRsL1–3,H1–3),templatesareselectedbymaximumsequencesimilarityusingaBLAST-basedmethodwithcustomdatabasesconstructedfromhigh-qualitystructuresinthePDB.CanonicalCDRconformationsarebasedonlength,soweuseseparatedatabasesforeachloop–lengthcombination.Forexample,ten-residueH1loopsandeleven-residueH1loopsareseparateBLAST-formatteddatabases.

TheresultsforeachstructuralcomponentaresortedbyBLASTbitscore,andthesequencewithbestscoreisselectedasthetemplate.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 5: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

5

InitialVH–VLorientations

TheinitialVL–VHorientationisselectedinmuchthesameway,withtheexceptionthattenVL–VHtemplatesareselectedratherthanasingleone18.Startingfromthelistofallpossibletemplatesorderedbybitscore,thebestmatchisselectedasthefirsttemplate.TodiversifytheinitialVL–VHorientations,alltemplateswithsimilarVL–VHorientations(0.5OCD,seeMarze&Gray18)tothistemplateareprunedfromthelist.Thebestmatchremaininginthelistisselectedasthesecondtemplate,andcandidatetemplatessimilartothesecondtemplatearenowremovedfromthelist.Thiswinnowingisrepeatedtocreatetendistincttemplates.OnegraftedmodelwillbecreatedfromeachoftheseteninitialVL–VHorientations.

GraftingCDRtemplates

OncetheinitialVL–VHorientationsareset,theCDRtemplatesaregraftedontoeachframeworkregionbysuperposingthetwooverlappingresiduesoneithersideoftheloopwiththeircorrespondingresiduesontheframeworkregions.ThegraftpointsarethenadjustedusingCyclicCoordinateDescent(CCD)19,20topreventunphysicalbondlengthsandanglesfrombeingincorporatedintothemodel.Finally,thestructureisrelaxed21,22viaiterationsofside-chainoptimizationandgradient-basedminimizationwhileconstrainingthebackboneandside-chainheavyatomstofindanative-likeconformationatalocalenergyminimuminRosetta’sscorefunction.

All-atomrefinementofCDRH3andtheVL–VHorientation

Thegraftedmodelsarecrudeandmustberefined,particularlyintheCDRH3loopandtheVL–VHorientation.TheH3loopisfirstcompletelyremodeledinthecontextoftheantibodyframeworkusingthenext-generationKIC(NGK)loopmodelingprotocol16.Forspeed,theH3loopsidechainsareeachreducedtoasinglelow-resolutionpseudo-atom,andtoensuresamplingoftheC-terminalkinkconformation,atomicconstraintsareappliedtothegoverningscorefunction23.Forsubsequenthigh-resolutionrefinement,theall-atomCDRH3sidechainsarerecovered,allCDRsidechainsarerepacked,andtheCDRsidechainsandbackbonesareminimized.TheVLandtheVHdomainsarere-dockedwitharigid-backboneRosettaDockprotocol24,25toremoveanyclashescreatedbythenewH3conformation,andtheantibodysidechainsareagainrepacked.UsingNGK,H3isrefinedagaininthecontextoftheupdatedVL–VHorientation.TheCDRsarepackedandminimizedagain,andthemodelissavedasacandidatestructure,ordecoy.Thefirstgraftedmodelisusedasthestartingpointfor1,000refinedmodels,ordecoys,andtheothergraftedmodelsareeachusedasthestartingpointfor200decoys,foratotalof2,800decoys.ThedecoysaresortedbyRosettascore,andthelowest-scoringonesaregivenasthefinalmodels.

Protocoloverview:antibody–antigendocking(steps11–16)

ComputationaldockingcanbeusedtogeneratemodelsofAb–Agcomplexes.Ingeneral,dockingentails(1)roughlyidentifying(within8Å)theinteractinginterfacethrougheither

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 6: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

6

experimentorglobaldockingand(2)refiningtheinitialmodelthroughlocaldocking.BelowwedescribelocaldockingwithSnugDockindetail.

Generatingthestartingmodel

SnugDockrequires,asaninput,aputativeAb–Agcomplexthatcontainsareasonableinterface26.Thecomplexcanbecomposedofsinglestructuresorsetsofstructures(ensembles,seeBox1).Theinterfacedefinesthelocalsearch,betweentheantibodyCDRsandtheantigen.InitialmodelsareoftenbasedonexperimentalresultsthatidentifyinteractingresiduesattheAb–Aginterface,suchasmutagenesisorchemicalcrosslinkingassays.Intheabsenceofexperimentalresults,aglobaldockingapproachsuchasZDOCK/iPatch10orPIPER/ADARS8cangenerateputativecomplexesforrefinement.GlobaldockingcanalsobeachievedwithSnugDock,albeitatahighercomputationalexpense.

AntigenorantibodystructuresthathavenotbeengeneratedbyaRosettaprotocolneedtoberefinedbeforebeingplacedincontact.Refinement,commonlyreferredtoastheRelaxprotocol21,22,entailsiterationsofside-chainoptimizationandgradient-basedminimizationinRosetta’sscorefunction.TheRelaxprotocolsampleslocalconformationalspacearoundthestartingstructuretoidentifyanenergeticminimuminthescorefunction.Throughthisprocess,Rosetta-identifiednon-idealities(suchasvanderWaalsbumps)areabated.Oncethepartnershavebeenrefined,aputativecomplexcanbeassembledandprepacked.Prepackingoptimizesside-chainconformationstopreventbiasingtowardtheinputcomplexmodel’sside-chainconformations,ensuringuniformscoringofallpotentialboundcomplexstates.

Performingdocking

SnugDockiterativelyperformsmulti-bodydockingofboththeAb–AgandVL–VHorientationsandremodelingoftheH2andH3CDRloops.Priortodocking,theprepackedstartingAb–Agcomplexissubjecttothreerigid-bodyperturbations:(1)arandomizedrotationabouttheAb–Agprimaryaxis,(2)asmall-magnituderandomtranslation,and(3)asmall-magnituderandomrotation.Dockingoperatesintwophases:low-resolutionmode,wheresidechainsarerepresentedbyasinglepseudoatomlocatedatthecentroidoftheside-chainheavyatoms,andhigh-resolutionmode,whereallproteinatomsareexplicit.Low-resolutionmodeconsistsoftwotypesofinterspersedMonteCarlomoves:rigid-bodyAb–Agtranslationandrotation,andbackboneensembleconformerswaps.Additionally,attheendoflow-resolutionmode,theH2&H3loopsarerefined.High-resolutionmodeconsistsofa50-stepMonteCarlotrajectorywhereeachmoveisselectedfromasetoffivepossiblemoves:rigidbodyAb–Agdocking(40%),rigidbodyVL–VHdocking(40%),CDRminimization(10%),H2looprefinement(5%),andH3looprefinement(5%),wherethepercentagesindicatetheprobabilitiesofselectingeachmove.Eachtrajectoryresultsinonedecoy.Typically,SnugDockisusedtogenerateatotalof1,000decoys,withthelow-scoringdecoysmostlikelytobenearthenativeconformation.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 7: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

7

Incorporatingexperimentaldataintothesimulation

TwomaintypesofexperimentaldatathatinformtheAb–AgbindingmodecanbeincorporatedintoSnugDock.First,knowledgeaboutspecificresiduesorpairsofresiduesthatinteractacrosstheinterfacecanbeusedtoguidedocking.Thisinformationcould,forexample,bederivedfromalaninescanningorothermutagenesisexperiments.Second,knowledgeabouttheepitopeandtheoverallAb-Agorientationcanbeincorporated.Bindingpatchdatamaybederivedfromdifferentexperiments,includinghydrogen/deuteriumexchangeorchemicalcrosslinkingofthebindingpartnerswithsubsequentanalysisbymassspectrometry.Othermethodsforepitopemappingmayalsobesuitable.

Dependingonthetypeofexperimentaldataavailable,therearedifferentwaysofincorporatingitintothedockingsimulation.High-confidenceresidue–residueinteractionscanbepreservedwiththeuseofatompairconstraints.Less-specificandpoorly-characterizedinteractions(hydrophobicpockets,ambiguousH-bonds)canbelooselyconstrainedwithambiguousandsiteconstraints.PredictedepitopesandbindingpatchescanbesampledbyproperlyplacingtheSnugDockinputstructureandadjustingthesizeoftheinitialstartingmove.Forfurtherinformationonincorporatingexperimentalconstriants,seetheRosettadocumentation27.

Caveats,challengesandpitfalls

Thereareseveralcaveatsassociatedwithcomputationalmodelingofantibodiesanddockingofantibodiesandantigens.Keepingthesecaveatsinmind,theusershouldcriticallyassesseachprediction(seeBox2).RosettaAntibodyisahomologymodelingapproachandcanbehamperedbytemplateavailability.Forexample,challengingtargetsincludeheavilyengineeredantibodiesorantibodiesderivedfromaspeciesthatdiversifiesitsantibodiesthroughgeneconversion,suchaschickensorrabbits.ErrorsintheFRandCDRL1–3,H1,H2loopsaretypicallysmall(nogreaterthan1ÅbackboneRMSDtonative)28.TheVL–VHorientation,correctlycapturedbyRosettaAntibodyin43of46benchmarkantibodytargets18.TheCDRH3loop,ontheotherhand,ismodeleddenovo,andloopmodelqualitydecreaseswithlooplength.IntheKICloopbenchmark,16,29loopsof12–17residuesaremodeledtonear1ÅbackboneRMSDrelativetothenativestructure—theaveragehumanCDRH3fallswithinthatrangewithanaveragelengthof15residues(IMGTdefinition)30.However,thebenchmarkismeasuredbymodelingloopsoncrystallographicframeworks,whereasinablindcontext,CDRH3loopsaremodeledonahomologyframeworks,whichintroducesuncertaintyintheloopenvironment.Nevertheless,inarecentassessment23RosettaAntibodyproducedmodelswithCDRH3loopswithin1.59ÅbackboneRMSDtonativeandsub-angstromaccuracyinallotherregions.

WhileSnugDockexplicitlysamplestheCDRH2orH3loopconformationandVL–VHorientationtoaccountformodeluncertaintyintroducedduringhomologymodelingandtosampletheregionsmostlikelytoundergoconformationalchangesuponantigenbinding,itdoesnotexplicitlysamplebackbonedegreesoffreedomoftheantigenorofnon-CDR-H2orH3regionsoftheantibody.Thus,iftheunboundandboundconformationsdiffersubstantiallyorifthehomologymodelsarepoor,itcouldbedifficultorimpossibletomodelthedockedcomplex

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 8: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

8

accurately31.Despitethiscomplication,SnugDockhassuccessfullypredictedAb–Agcomplexesfromhomologymodels5.

Availability

RosettaAntibodyandSnugDockcanberunviaapublicwebserver(http://rosie.rosettacommons.org),pythonbindings(PyRosetta,http://www.pyrosetta.org)andthroughlocalinstallationsofRosetta.RosettaisdistributedassourcecodeandlicensesareavailablefromtheRosettaCommons(http://www.rosettacommons.org)freeofchargeforacademicandnon-profitusers.RosettacanbeinstalledonUNIX-likeoperatingsystems(includingMacOSX).

MATERIALS

EQUIPMENT

Homologymodelingdata

• Primaryamino-acidsequenceofthevariabledomainofthelightandheavychains.

Dockingdata

• PDB-formattedfileoftheantigenstructure.• PDB-formattedfileoftheantibodystructure,fromthehomologymodelingoutput.• Bothofthesecanbesinglestructuresoranensembleofstructures.

SoftwareforrunningsimulationsviaROSIEwebserver

• Modernwebbrowser

Hardwareforrunningsimulationsmanually(optional)

• Workstationwithmulti-coreCPU(s) runningaPOSIXcompliantoperatingsystem(e.g.,GNU/Linux,OSX)

OR

• a Linux-based cluster. Several public facilities are available. For example, the U.S.National Science Foundation’s provides clusters like Stampede through the ExtremeScience and Engineering Discovery Environment (XSEDE, www.xsede.org). In Europe,thePartnershipforAdvancedComputinginEurope(PRACE,www.prace-ri.eu)providesaccess to clusters like JUQUEEN. Resources like the Norwegian Metacenter forComputational Science (Notur, www.notur.no) or Japan’s supercomputer facilities ofNational Institute ofGenetics (sc.ddbj.nig.ac.jp) andofHumanGenomeCenter at theUniversityofTokyo(hgc.jp)arealsosuitable.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 9: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

9

Softwareforrunningsimulationslocally(optional)

• TheRosettasoftwaresuite,availableatwww.rosettacommons.org/softwareo Compilationinstructionsavailableatwww.rosettacommons.org/build.

?TROUBLESHOOTINGo Supportforanyissuesencounteredthatarenotcoveredinthismanuscriptcan

beaddressedontheRosettauserforums:www.rosettacommons.org/forum• BLAST+(version2.2.28orlater),availableat

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/• Texteditor(e.g.,vim,emacs,nano)• Optional:Python(www.python.org)orR(www.r-project.org)foranalyzingresults• Optional:Amolecularvisualizationpackageforviewingresultsandcustomizingstarting

structuresfordocking.RecommendedpackagesincludePyMOL(www.pymol.org)32,UCSFChimera(www.cgl.ucsf.edu/chimera)33,andKinemage(kinemage.biochem.duke.edu)34

PROCEDURE

Thesimplestwaytocreateantibodyandantibody-antigencomplexstructuresisthroughtheuseoftheROSIEwebserver(rosie.rosettacommons.org)35.OnROSIE,theAntibodyappusestheinputantibodysequencetogenerateahomologymodel,andtheSnugDockappusestheantibodymodel(s)andanantigenstructuresfordocking.Bothoperationsareentirelyautomatedwithaminimumofuserinput.

Forgreatercontroloftheoperation,wedescribebelowthestepstoruntheprotocolsmanually,includingthekeypointsforcheckingintermediatedataandinterveningwithalternatechoices.Userswithstructuresoftheunboundantibodyandantigencanskiptodockingstage(step11).

I.AntibodyHomologyModeling

ConstructionofagraftedFvmodelTIMING75-90minutes

1. Setupyourterminal.AfterinstallingBLAST+andRosetta(seeMaterials),launchaninteractiveterminal(e.g.,TerminalonmacorxtermonLinux)andsetpathvariablestotheexecutableprogramsneededasfollows(bashsyntax):

export ROSETTA=~/Rosetta export ROSETTA3_DB=$ROSETTA/main/database export ROSETTA_BIN=$ROSETTA/main/source/bin export PATH=$PATH:$ROSETTA_BIN

Inthefirstlineabove,replace“~”withtheparentdirectorywhereyouinstalledRosettaonyourmachine.Similarly,besurethePATHvariableincludestheblastpprogram(e.g.export PATH=$PATH:/path/to/blastpwhere/path/to/blastpisreplacedwiththedirectorycontainingtheblastpexecutable.Thesepathsettingsmaybeaddedtoa

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 10: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

10

configurationfilesuchas.bashrcsotheyareautomaticallyseteachtimeaterminalisopen(loggedinto).

2. Createaworkingdirectoryandnavigatetoit:

mkdir /path/to/my_dir cd /path/to/my_dir

3. Obtaintheaminoacidsequencesforthevariabledomainofyourantibody(lightchainandheavychain)andsavetheminFASTAformat(inyourworkingdirectory)withtheheavyandlightchainsnotedinthecommentlines,asfollows:

> heavy VKLEESGGGLVQPGGSMKLSCATSGFRFADYWMDWVRQSPEKGLEWVAEIRNKANNHATYYAESVKGRFTISRDDSKRRVYLQMNTLRAEDTGIYYCTLIAYBYPWFAYWGQGTLVTVS > light DVVMTQTPLSLPVSLGNQASISCRSSQSLVHSNGNTYLHWYLQKPGQSPKLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDLGVYFCSQSTHVPFTFGSGTKLEIKR

4. UseRosetta’sgraftingapplicationtofindsuitabletemplatesandgraftthemtogethertoobtainacrudemodeloftheantibody.Executetheapplicationwiththelinebelow.

antibody.macosclangrelease \ -fasta antibody_chains.fasta | tee grafting.log

Theapplicationwilloutputadirectorycalledgrafting.ThePDB-formattedfilesnamedmodel-0.relaxed.pdb,model-1.relaxed.pdb,…,model-9.relaxed.pdbwillbeyourinputfortheH3modeling.The“| tee grafting.log”partofthecommandrecordsalltheprogramoutputinthefilegrafting.logforlaterreview.The“\”permitsthecommandtobespreadacrossmultiplelinesratherthanjustone.

?TROUBLESHOOTING

(Optional)CheckgraftedtemplatestructuresTIMING:10minutes–2hours

5. AssigntheCDRloopsinyourmodelstotheCDRloopclustersdescribedbyNorthetal.36andcheckwhetherthechosentemplatesaresuitable.

Runtheclusteridentificationapplicationasfollows:

identify_cdr_clusters.macosclangrelease \ –s grafting/model-*.relaxed.pdb \ –out:file:score_only north_clusters.log

Northetal.clusteredallCDRloopstructuresbytheirbackbonedihedralanglesandnamedthembyCDRtype,looplengthandclustersize(e.g.“H1-13-10”isthe10thmostcommonconformationfor13-residueH1loops).Occasionally,RosettachoosestemplatesthatarerareorinconsistentwiththesequencepreferencesobservedbyNorthetal.Forexample,ifRosettarecommendstheH1-13-10cluster,theusermight

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 11: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

11

alsoconsidertheH1-13-1cluster.Tables3-7ofNorthetal.presentconsensussequencesforeachclusterthatcaninformthisdecision.Loopsandclusterswithprolineresiduesarealsoworthamanualexamination.SeveralclustersofNorthetal.arecontingentonthepresenceofprolinesinparticularlocations(e.g.L3-9-cis7-1hasacis-prolineatposition7).BecauseRosettaAntibodyreliesonBLASTtochooselooptemplates,occasionallyaloopfromanuncommonnon-cis-prolinecluster(e.g.L3-9-2)ischosen.Insuchcasesitisbesttomanuallyselectalooptemplatefromthewell-populatedcis-prolinecluster.

6. If desired, rerun grafting to replace a template with one from a manually-specifiedsourcestructure.Usetheantibodycommandlineasabovewithanextraflagtospecifyatemplate.Followthebelowexample:toforceRosettatousetheCDRH1loopfromthePDB1RZI as the template in themodel, add the flag–antibody:h1_template 1rzi.Selecttemplatesforotherregionsaccordingly:

antibody.macosclangrelease \ -fasta antibody_chains.fasta \ -antibody:h1_template 1rzi | tee graft.log

Flag region-antibody:l1_template -antibody:l2_template -antibody:l3_template

lightchainCDRloops

-antibody:h1_template -antibody:h2_template –antibody:h3_template

heavychainCDRloops

-antibody:light_heavy_template -antibody:n_multi_templates 1

VL–VHorientation

-antibody:frl_template -antibody:frh_template

Frameworkregionofthelightorheavychain

H3modelingTIMING1hourto4days

7. Copy the set of standard H3 modeling flags to your working directory and create adirectoryfortheH3modelingoutput:

cp $ROSETTA/tools/antibody/abH3.flags . mkdir H3_modeling

8. Run Rosetta’s antibody_H3 application on the 10 models generated during grafting.This step requires 2,500CPUhours and is oftenperformed in parallel on a computercluster(seeBox3).ForaMacworkstation,usethefollowingcommandline:

$ROSETTA_BIN/antibody_H3.macosclangrelease \ @abH3.flags \ -s grafting/model-0.relaxed.pdb \ -nstruct 1000 \ -antibody:auto_generate_kink_constraint \

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 12: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

12

-antibody:all_atom_mode_kink_constraint \ -multiple_processes_writing_to_one_directory \ -out:file:scorefile H3_modeling_scores.fasc \ -out:path:pdb H3_modeling > h3_modeling-0.log 2>&1 &

• -sspecifiestheinputfile(oneofthegraftedmodelsgeneratedinstep2).• -nstruct specifies the number of structures generated, which should be 1000 for

model.0.pdband200eachforallothergraftedmodels.

TheexpectedoutputisthespecifiednumberofPDBfilesaswellasascorefilenamedH3_modeling_scores.fasc.AllthesefileswillappearinanoutputdirectorynamedH3_modeling/.

Totriviallyruninparallel,simplyrepeatedlyexecutetheabovecommand(changinginputmodels,numberofstructures,andtheoutputlogasyouwish).Eachtimethecommandisexecuted,anantibody_H3processisruninthebackground.

!Caution:Generatingthe2,800antibodystructurestakesapproximately2,500CPUhours.Running24processesinparallel,onamodern24-CPUworkstation,expect~4daysofruntime.Distributingtheworkovernodesonasupercomputercanreducethistimetohours(seeMaterials).

(Optional)CheckVL–VHorientationTIMING:5min

9. Check whether the VL–VH orientations of the antibody models are close to theorientationsobservedinantibodycrystalstructuresfoundinthePDB.Runthepythonscriptplot_LHOC.pyusingthefollowingcommandline:

python $ROSETTA/main/source/scripts/python/public/plot_VL_VH_orientational_coordinates/plot_LHOC.py

Thisscriptwillcreateasubfolder(lhoc_analyis)withseparateplotsforeachofthefourLHOCmetrics.EachplotshowsthenativedistributionofVL–VHorientations(grey),theorientationssampledbyRosetta(blackline)aswellasthetop10models(labeleddiamonds)andthe10differenttemplatestructuresgeneratedduringstep2(dots).Antibodymodelsthatareoutsidethenativedistributionsareunlikelytobecorrect.

ChoosefinalantibodymodelsTIMING10min

10. Choose 10of the antibodymodels as an ensemble for docking. The following criteriamaybeuseful toconsiderasdockingwithensemblesaimsto increaseconformationaldiversityandsampling:

a. Selectmodelswiththelowesttotalscore–thesearepurportedlynative-likeb. Select models with natural VL–VH orientation, falling within the observed

distribution(grey).c. Selectmodelsderivedfromdifferenttemplatestomaintaindiversity.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 13: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

13

Ifalltentop-scoringmodelsareoutsidethenativedistribution,considerreturningtostep6andmanuallyselectnewtemplatesfortherelativeorientationoftheVLandVHchainsbyusingthe-antibody:light_heavy_template flag(e.g.,antibody.macosclangrelease -antibody:light_heavy_template 1ABC)

II.Antibody-AntigenDockingTIMING1hour

11. Preparetheantigenandantibodyfordocking.Formatyourantigen(andantibodyifyouarenotusingahomologymodelproducedbyRosettaAntibody)PDB file so it canbereadbyRosetta.Runthefollowingscript:

$ROSETTA/tools/protein_tools/scripts/clean_pdb.py antigen.pdb C

Where antigen.pdb is a PDB file of your antigen and C is the one-letter chainidentifier(s)fortheantigenchain(s)inthePDBfile.

(Optional)RefineantibodyinRosetta’sscorefunctionTIMING10min

12. If you are not using an antibody model produced by Rosetta, you must refine theantibodystructurebyrunningtherelaxapplication.Thecommandlineis:

relax.macosclangrelease \ -s antibody.pdb \ -relax:constrain_relax_to_start_coords \ -relax:ramp_constraints false \ -ex1 \ -ex2 \ -use_input_sc \ -flip_HNQ \ -no_optH false

Youmayalsowishtogenerateanensembleofantibodystructures,seeBox2.

PrepackingTIMING10min

13. GenerateaPDBfilethatcontainsbothyourantibodyandyourantigeninthefollowingorder:lightchainofyourantibody(L),heavychainofyourantibody(H),andantigen(A).ThereareseveralwaystocreateandmodifyaPDBfile.Forexample,withPyMOL:

a. LoadtheantibodyinaPyMOLsession.i. IfitisamodelfromRosettaAntibody,thechainswillalreadybelabeled

asHandL.Otherwise,usethealtercommandtochangethechainIDofaselection:

alter chain A, chain=’H’ alter chain B, chain=’L’

b. LoadtheantigenintothesamePyMOLsession.ChangetheantigenchainIDinasimilar fashion.!Warning: if antigen chains share an IDwith the antibody, youwill have tobe

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 14: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

14

more specific with your selections (e.g., alter chain H and antigen,

chain=’A’).c. Reorient the antibody and antigen using the RotO, MovO, and MvOZ editing

commands. Alternatively, one can also use the translate command (i.e.translate [x,y,z], selection). If you know an approximate bindinglocation,adjusttheorientationaccordingly.

d. SavebothobjectsinthesamePDBfile:

save antibody_antigen_start.pdb, chains L+H+A

14. Toensurelow-energystartingside-chainconformations,prepackthemonomers:

docking_prepack_protocol.macosclangrelease \ -in:file:s antibody_antigen_start.pdb \ -ex1 \ -ex2 \ -partners LH_A \ -ensemble1 antibody_ensemble.list \ -ensemble2 antigen_ensemble.list \ -docking:dock_rtmin

antibody_ensemble.listisatextfilethatcontainsfilenameswithabsolutepathstothetenantibodymodelsselectedafterantibodymodeling.Inthecasethatyouhaveasinglecrystalstructure,youcanomitthe–ensemble1flag.

Ifantigenflexibilityisexpected,afamilyofstructurescanbecreatedwithotherRosettaapplications(seeBox1).Thetextfileantigen_ensemble.listwillcontainthefilenamesofyourantigen(usingabsolutepaths).NMRstartingstructuresmustbesplit(i.e.eachmodelshouldbeinitsownPDBfile).Touseasingleantigenstructure,omitthe–ensemble2flag.

DockingTIMING15min

15. Docktheantibodytotheantigen.Asinstep8,thisisanexpensivecomputationalstepand you have the option of running a single process, multiple processes on onemachine,orsplittingthejobacrossprocessorsonasupercomputer(seeBox3).Usingtheexecutable foranMPI-basedcomputingclusterasanexample, thecommand linefordockingis:

snugdock.mpi.linuxgccrelease \ -s prepack/antibody_antigen_start.prepack.pdb \ -ensemble1 antibody_ensemble.list \ -ensemble2 antigen_ensemble.list \ -antibody:auto_generate_kink_constraint \ -antibody:all_atom_mode_kink_constraint -nstruct 1000

?TROUBLESHOOTING

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 15: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

15

TIMING

Herewereportthetimetogenerateasingle,dockedmodelfromantibodysequenceandantigencrystalstructure.Typically,however,thousandsofmodelsaregenerated,soweindicateinparenthesesthetimingforthefull,recommendedsimulations.Thesetimeestimateswerecomputedona2x2.4GHzQuad-CoreIntelXeonprocessor;timingwillvaryforothercomputerconfigurations.

Step HumanTime CPUTimeperModel TotalCPUTime(1–4)ConstructionofgraftedFvmodels 5 min 20 min 200 min

(5)Checkgraftedmodels 10 min 1 min 10 min (7–8)H3modeling 5 min 50 min 2500 hrs

(9)CheckVL–VHorientation 5 min 1 min 10 min (10)Choosemodels 10 min 5 min 5 min

(11)Prepareantibodyandantigenfordocking 5 min 15 min 15 min (12)RefineantibodyinRosetta’sscorefunction 5 min 20 min 20 min

(13–14)Prepacking 5 min 10 min 10 min (15)Docking 5 min 15 min 250 hrs

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 16: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

16

TROUBLESHOOTING

STEP PROBLEM POSSIBLEREASON SOLUTION

0 Rosettadoesnotcompile. Likelytoberelatedtothespecificcomputeroperatingsystemandconfiguration

SeekhelpontheRosettaforums,www.rosettacommons.org/forum

4 RosettaAntibodyencounterserror“sh: blastp: command not found”

Theblastpexecutableisnotinstalledornotininyour$PATH

Onthecommandline,try‘which blastp’tocheckifyoursystemhasitinstalled.Ifneeded,downloadandinstallBLASTor/andaddblastptoyourPATH(export PATH=$PATH:/path/to/blastp/).Youcanalsospecifythepathusingthecommandlineflag-antibody:blastp /my/path

4 RosettaAntibodyencountersencounters“BLAST Database error”

Theblastpdatabaseisnotspecified,andRosettaAntibodyisnotfindingitinthedefaultlocation($ROSETTA/tools/antibody/blast_database/)

Specifythegraftingdatabaselocationwith-antibody:grafting_database /database/location

4 RosettaAntibodyproducesBLASToutput(e.g.grafting/orientation.align)butdoesnotproducestructuralmodels(e.g.model.0.pdb)

YourversionofBLAST+maybeoutofdate.

DownloadacompatibleversionofBLAST+(version2.2.28orlater).SeeMaterialssection.

4 RegularexpressionfailureforCDRidentification

MutationsinregionsofthechainthatRosettaexpectstobeconservedpreventthesequencefrombeingsplitintostructuralsegmentscorrectly.

Checkyourantibodysequenceagainstotherknownantibodysequences,focusingontheregiongivenintheerrormessage.SeekhelpontheRosettaforums,www.rosettacommons.org/forum.

15 SnugDockreports“ERROR:Couldnotfinddisulfidepartnerforresidue23”

Adisulfidebondwasdisruptedduringdocking.

Youcandisabledisulfidebonddetectionwiththeflag-detect_disulf false

15 snugdock reportserror “chains are not named correctly or are not in the expected order”

InputPDBdoesnotcontainchainsincorrectorder(light,heavy,thenantigen)orchainIDsare

AdjustchainorderininputPDBorspecifychainIDswiththe–partners AB_C flag,whereA,BandCarethelight,heavy,and

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 17: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

17

notL,H,andA. antigenchainIDs,respectively.2–15 UnknownRosettaerror. SeekhelpontheRosettaforums,

www.rosettacommons.org/forum2–15 Commonfixes • Checkformisspellings

• Checkpathsarecorrect• CheckFASTAformatting• CheckPDBformatting

ANTICIPATEDRESULTS

Theantibodystructurepredictionanddockingmethodsdescribedinthispapereachproduceasetofstructuralmodelsthathavebeenevaluatedbyascorefunction.Inthecaseofantibodystructureprediction,wehavefoundthroughbenchmarkingandparticipationintheAMAthattheaccuracyofframeworksandnon-H3CDRloopscantypicallybeexpectedtobewithin1.0ÅRMSDofthecoordinatesinacrystalstructure.Whenthemodeldeviatesmorethan1.0ÅinRMSDfromcrystallographiccoordinatesitisusuallybecausethereisnotasuitableknowntemplateinthePDB.ThesesituationsshouldbecomeincreasinglyrareasmorestructuresaredepositedintothePDB,althoughheavilyengineeredantibodiesshouldalwaysbemodeledwithcare.

TheH3loopaccuracyisvariableanddependsbothonlengthandVL–VHorientation.Looplengthisanimportantfactorintheaccuracyofdenovoloopmodelingmethodsbecausethesearchspaceincreasesexponentiallywitheachadditionalresidueintheloop.WeexpectaccuratemodelsofCDRH3loopsoflength14orless23,butthetop-scoringmodelmaynotbethemostaccurate.Wethereforerecommendusingalltenmodelsfordownstreamanalysis.InAMA-II,wefoundthatnon-nativeVL–VHorientationscanleadtoexplicitinteractionsbetweenthelightchainandtheCDRH3loopthatareindistinguishablefromnativeinteractions28.UsingmultipleVL–VHorientationtemplates18allowsbroaderexplorationofconformationalspace,samplingmorelow-scoringwells.ModelsgeneratedfromatleastthreedifferenttemplatesshouldbeusedtomaximizethechanceofcapturingthenativeVL–VHorientation.

ThroughbenchmarkingAb–Agdocking,wehavefoundthattheaccuracyofacomplexmodeldependsonthestartingconfigurationofthepartnersandtheaccuracyofthemodelsforeachpartner.SnugDocksampleslocalconformationspace,thusagoodstartingstructure(within8Å)generallyresultsinsamplinganear-nativeconformation.Equallyimportantisthequalityoftheinitialunboundmodels;near-nativemodelsenableincreaseddockingperformance(seeTable1:B-Brigidbody-dockingvs.U-Urigid-bodydocking).Wehavefoundthatdockingahomologymodeledantibodytothecrystalstructureoftheunboundantigentypicallyresultsinatleastonemodelofacceptablequalityinthetentop-scoringmodels(Table1).

AUTHORCONTRIBUTIONS

BDW,NM,SL,DK,JRJ,andJJGdevelopedthecurrentversionofRosettaAntibody.SLdevelopedROSIEandimplementedtheRosettaAntibodyandSnugDockserverapps.BDWimplementedSnugDockinRosetta3,JRJbenchmarkedSnugDock’sperformance.RFandNBwrotethe

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 18: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

18

procedure,codifiedthemanualinterventionstepsdevelopedbyBDW,NM,andDK,andrecordedtiminginformation.BDW,NM,SL,JRJ,DK,RF,NBandJJGwrotethemanuscript.

ACKNOWLEDGMENTS

TheauthorswishtothankArvindSivasubramanian,AroopSircar,andSidharthaChaudhuryfortheirdevelopmentoftheoriginalRosettaAntibody,SnugDock,andEnsembleDockmethods.JianqingXurefactoredtheantibodycode.WealsothankthemembersoftheRosettaCommonsforthecontinueddevelopmentoftheRosettaSoftwareSuite.ROSIEsimulationsarecarriedout,inpart,withintheExtremeScienceandEngineeringDiscoveryEnvironment(XSEDE),whichissupportedbyNationalScienceFoundationgrantnumberACI-1053575.BW,NM,JRJandJJGaresupportedbyNationalInstitutesofHealthGrantR01GM078221.SLissupportedbyNationalInstitutesofHealthGrantR01GM73151.DKissupportedbytheDARPAAntibodyTechnologyProgram(HR-0011-10-1-0052)andtheJapanSocietyforthePromotionofScience(grantnumber15H06606).RFissupportedbytheSouth-EasternNorwayRegionalHealthAuthority(grantnumber850703-6051-39788).

COMPETINGFINANCIALINTERESTS

Theauthorsdeclarenocompetingfinancialinterests.AllrevenuegeneratedbylicensingRosettatofor-profitentitiesisinvestedintothecontinueddevelopmentofthesoftware.

REFERENCES

1. Georgiou,G.,Ippolito,G.C.,Beausang,J.,Busse,C.E.,Wardemann,H.&Quake,S.R.Thepromiseandchallengeofhigh-throughputsequencingoftheantibodyrepertoire.Nat.Biotechnol.32,158–168(2014).

2. Reichert,J.M.Antibodiestowatchin2016.MAbs8,197–204(2016).

3. Correia,B.E.,Bates,J.T.,Loomis,R.J.,Baneyx,G.,Carrico,C.,Jardine,J.G.,Rupert,P.,Correnti,C.,Kalyuzhniy,O.,Vittal,V.,etal.Proofofprincipleforepitope-focusedvaccinedesign.Nature507,201–6(2014).

4. Al-Lazikani,B.,Lesk,A.M.&Chothia,C.Standardconformationsforthecanonicalstructuresofimmunoglobulins.J.Mol.Biol.273,927–948(1997).

5. Sircar,A.&Gray,J.J.SnugDock:Paratopestructuraloptimizationduringantibody-antigendockingcompensatesforerrorsinantibodyhomologymodels.PLoSComput.Biol.6,e1000644(2010).

6. Ponomarenko,J.V&Bourne,P.E.Antibody-proteininteractions:benchmarkdatasetsandpredictiontoolsevaluation.BMCStruct.Biol.7,64(2007).

7. Kozakov,D.,Brenke,R.,Comeau,S.R.&Vajda,S.PIPER:AnFFT-basedproteindocking

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 19: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

19

programwithpairwisepotentials.ProteinsStruct.Funct.Genet.65,392–406(2006).

8. Brenke,R.,Hall,D.R.,Chuang,G.Y.,Comeau,S.R.,Bohnuud,T.,Beglov,D.,Schueler-Furman,O.,Vajda,S.&Kozakov,D.Applicationofasymmetricstatisticalpotentialstoantibody-proteindocking.Bioinformatics28,2608–2614(2012).

9. Chen,R.,Li,L.&Weng,Z.ZDOCK:Aninitial-stageprotein-dockingalgorithm.ProteinsStruct.Funct.Genet.52,80–87(2003).

10. Krawczyk,K.,Baker,T.,Shi,J.&Deane,C.M.Antibodyi-Patchpredictionoftheantibodybindingsiteimprovesrigidlocalantibody-antigendocking.ProteinEng.Des.Sel.26,621–629(2013).

11. Sircar,A.,Chaudhury,S.,Kilambi,K.P.,Berrondo,M.&Gray,J.J.AgeneralizedapproachtosamplingbackboneconformationswithRosettaDockforCAPRIrounds13-19.ProteinsStruct.Funct.Bioinforma.78,3115–3123(2010).

12. Méndez,R.,Leplae,R.,Lensink,M.F.&Wodak,S.J.AssessmentofCAPRIpredictionsinrounds3-5showsprogressindockingprocedures.ProteinsStruct.Funct.Bioinforma.60,150–169(2005).

13. O’Meara,M.J.,Leaver-Fay,A.,Tyka,M.D.,Stein,A.,Houlihan,K.,Dimaio,F.,Bradley,P.,Kortemme,T.,Baker,D.,Snoeyink,J.,etal.Combinedcovalent-electrostaticmodelofhydrogenbondingimprovesstructurepredictionwithRosetta.J.Chem.TheoryComput.11,609–622(2015).

14. Coutsias,E.A.,Seok,C.,Jacobson,M.P.&Dill,K.A.Akinematicviewofloopclosure.J.Comput.Chem.25,510–528(2004).

15. Mandell,D.J.,Coutsias,E.A.&Kortemme,T.Sub-angstromaccuracyinproteinloopreconstructionbyrobotics-inspiredconformationalsampling.Nat.Methods6,551–552(2009).

16. Stein,A.&Kortemme,T.ImprovementstoRobotics-InspiredConformationalSamplinginRosetta.PLoSOne8,e63090(2013).

17. Johnson,G.&Wu,T.T.Kabatdatabaseanditsapplications:30yearsafterthefirstvariabilityplot.NucleicAcidsRes.28,214–8(2000).

18. Marze,N.A.&Gray,J.J.ImprovedpredictionofantibodyVL–VHorientation.ProteinEng.Des.Sel.AdvancedAccess,gzw013(2016).

19. Canutescu,A.A.&Dunbrack,R.L.Cycliccoordinatedescent:Aroboticsalgorithmforproteinloopclosure.ProteinSci.12,963–72(2003).

20. Wang,C.,Bradley,P.&Baker,D.Protein–ProteinDockingwithBackboneFlexibility.J.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 20: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

20

Mol.Biol.373,503–519(2007).

21. Bradley,P.,Misura,K.M.S.&Baker,D.TowardHigh-ResolutiondeNovoStructurePredictionforSmallProteins.Science.309,1868–1871(2005).

22. Misura,K.M.S.&Baker,D.Progressandchallengesinhigh-resolutionrefinementofproteinstructuremodels.ProteinsStruct.Funct.Genet.59,15–29(2005).

23. Weitzner,B.D.&Gray,J.J.AccuratestructurepredictionofCDRH3loopsenabledbyanovelstructure-basedC-terminal‘kink’constraint.InReview,(2016).

24. Gray,J.J.,Moughon,S.,Wang,C.,Schueler-Furman,O.,Kuhlman,B.,Rohl,C.A.&Baker,D.Protein–ProteinDockingwithSimultaneousOptimizationofRigid-bodyDisplacementandSide-chainConformations.J.Mol.Biol.331,281–299(2003).

25. Chaudhury,S.&Gray,J.J.ConformerSelectionandInducedFitinFlexibleBackboneProtein–ProteinDockingUsingComputationalandNMREnsembles.J.Mol.Biol.381,1068–1087(2008).

26. Kuroda,D.&Gray,J.J.Shapecomplementarityandhydrogenbondpreferencesinprotein-proteininterfaces:implicationsforantibodymodelingandprotein-proteindocking.Bioinformaticsbtw197(2016).doi:10.1093/bioinformatics/btw197

27. RosettaConstraintsDocumentation.https://www.rosettacommons.org/docs/wiki/rosetta_basics/Incorporating-Experimental-Data

28. Weitzner,B.D.,Kuroda,D.,Marze,N.,Xu,J.&Gray,J.J.BlindpredictionperformanceofRosettaAntibody3.0:Grafting,relaxation,kinematicloopmodeling,andfullCDRoptimization.ProteinsStruct.Funct.Bioinforma.82,1611–1623(2014).

29. ÓConchúir,S.,Barlow,K.A.,Pache,R.A.,Ollikainen,N.,Kundert,K.,O’Meara,M.J.,Smith,C.A.&Kortemme,T.AWebresourceforstandardizedbenchmarkdatasets,metrics,androsettaprotocolsformacromolecularmodelinganddesign.PLoSOne10,e0130433(2015).

30. Zemlin,M.,Klinger,M.,Link,J.,Zemlin,C.,Bauer,K.,Engler,J.A.,Schroeder,H.W.&Kirkham,P.M.ExpressedMurineandHumanCDR-H3IntervalsofEqualLengthExhibitDistinctRepertoiresthatDifferintheirAminoAcidCompositionandPredictedRangeofStructures.J.Mol.Biol.334,733–749(2003).

31. Kuroda,D.&Gray,J.J.Pushingthebackboneinprotein-proteindocking.InReview,(2016).

32. Schrödinger,L.ThePyMOLMolecularGraphicsSystem,Version1.8.(2015).

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 21: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

21

33. Pettersen,E.F.,Goddard,T.D.,Huang,C.C.,Couch,G.S.,Greenblatt,D.M.,Meng,E.C.&Ferrin,T.E.UCSFChimera--avisualizationsystemforexploratoryresearchandanalysis.JComputChem25,1605–1612(2004).

34. Chen,V.B.,Davis,I.W.&Richardson,D.C.KiNG(Kinemage,NextGeneration):Aversatileinteractivemolecularandscientificvisualizationprogram.ProteinSci.18,2403–2409(2009).

35. Lyskov,S.,Chou,F.-C.,Conchúir,S.Ó.,Der,B.S.,Drew,K.,Kuroda,D.,Xu,J.,Weitzner,B.D.,Renfrew,P.D.,Sripakdeevong,P.,etal.ServerificationofMolecularModelingApplications:TheRosettaOnlineServerThatIncludesEveryone(ROSIE).PLoSOne8,e63906(2013).

36. North,B.,Lehmann,A.&Dunbrack,R.L.AnewclusteringofantibodyCDRloopconformations.J.Mol.Biol.406,228–256(2011).

37. RosettaPrepackProtocolDocumentation.https://www.rosettacommons.org/docs/wiki/application_documentation/docking/docking-prepack-protocol

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 22: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

22

Table1.LocalAb–Agdockingbenchmarkresults.Co-crystalPDBIDsindicatethenativecomplex.PDBIDslistedunderthe“Type”columnalsoindicatetheuseofunbound(U)orbound(B)componentstructures(asavailable).ModelqualityisdefinedbytheCAPRIrankingcriteriarepresentedbyanumberofstarsorazero(0).Three,two,andonestar(s)indicatehigh,medium,andacceptablequality,respectively,andazeroindicatesincorrectmodels.Forthe“Rigid-Body”and“SnugDock”columns,thequalityofthelowest-scoringmodel,byinterfaceenergy,isreported(an“f”indicatesastrongenergyfunneldefinedasfiveormoreofthetenlowest-scoringmodelsbeingmediumqualityorbetter).EnsembleSnugDocksimulationswererunwithmulti-templategraftingandaCDRH3kinkconstraint.CAPRISummarylinessummarizemodelqualityforalltargets.CAPRISummaryTop10takesthehighest-qualitymodelfromthetenlowest-scoringmodels.

Co-crystal(PDBID) Type(Ab-Ag)

CDRH3Length

Rigid-BodyDockXtal

Rigid-BodyDockModel SnugDock

EnsembleSnugDock

1mlc U(1mlb)-U(1lza) 7 0 0 ** *1ahw U(1fgn)-U(1boy) 8 0 * * **1jps U(1jpt)-U(1tfh) 8 ** * * **1wej U(1qbl)-U(1hrc) 8 0 0 0 *1vfb U(1vfa)-U(8lyz) 8 * * 0 *1bql B-U(1dkj) 7 ***f 0 * 01k4c B-U(1jvm) 9 **f 0 * **2jel B-U(1poh) 9 ***f 0 ** *1jhl B-U(1ghl) 9 ***f 0 0 **1nca B-U(7nn9) 11 ***f 0 0 *2bdn B-B 8 ***f * * *1ynt B-B 9 ***f ***f **f ***f2aep B-B 9 ***f ** * 02b2x B-B 10 ***f * 0 **1ztx B-B 10 ***f * **f 0

CAPRISummaryTopDecoy 9***/2**/1* 1***/1**/6* 0***/4**/6* 1***/5**/6*CAPRISummaryTop10Decoys 9***/4**/2* 1***/6**/8* 1***/10**/4* 2***/11**/2*

No.ofFunnels 10 1 2 1

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 23: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

23

Figure1.Aschematicofthemodelingprotocols.ThestructureontheleftshowstheFvantibodydomainspredictedbyhomologymodeling(heavychainindarkbluewithCDRH1andH2loopsinorangeandCDRH3loopinred;lightchaininyellowwithitsCDRloopsinlightblue).Thestructureontherightdepictsanantibody–antigenstructureoutputbydocking(antigeningreen).

HOMOLOGY MODELLING

ANTIBODY SEQUENCES

CRYSTAL STRUCTURE OF THE ANTIBODY

STRUCTURE OF THE ANTIGEN

ANTIBODY- ANTIGEN DOCKING

FINAL DOCKED MODELS

>H|antibody EVQLLESDGGLV... >L|antibody TDIMQSPSSLAV...

H2 H1

H3

L3L1

L2

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 24: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

24

Figure2:Exampleoutputofplot_LHOC.py.ThetwoplotsshowdistributionsoftheHeavyOpeningAngle18asobtainedbyplot_LHOC.pyfortwodifferentantibodies.The10distinctlight-heavyorientationtemplatesarerepresentedbythecircles.Thetentop-scoringmodelsafterH3loopmodelingarerepresentedbythediamondswiththefillcolorcorrespondingtothestartingtemplate;inthelegend,thesepointsareorderedfromsmallesttolargestmetricvalue.ForAntibody_1,theanglessampledbyRosettaoverlapwiththeanglesobservedinantibody

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 25: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

25

crystalstructures.Thetentop-scoringmodelsareclosetothecenterofthedistribution.InAntibody_2,mostoftheanglessampledarefoundrarelyornotatallinantibodycrystalstructures.Thetentop-scoringmodelsarealsoshiftedtolargeranglesthantypicallyfoundinantibodies.ForAntibody_2,theusermightconsidertryingalternatelight-heavyorientationtemplates(Step10).

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 26: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

26

Box1|Increasingsamplingduringdockingbyincorporatingbackbonestructuralensembles.

InRosetta,anensembleisasetofdiscreteconformationsofaproteinstructure.SnugDockusesensemblestoapproximatebackboneconformationalflexibilitybysamplingconformationsfromtheensembleduringdocking.Throughthisapproach,notonlydoestheprotocolexploremoreconformationalspacethanstandarddocking,butitcanalsocompensateformodelerror,forexamplebyusinganensembleofmodelsproducedbyamodelingapproachinapreviousstepsuchasRosettaAntibody.

RosettaensemblescanbeconverteddirectlyfromNMRensembles,ortheycanbegeneratedusinganymethodthatinducesstructuraldiversity,suchasmoleculardynamicsorvariousRosettarefinementprotocols.Theensemblestypicallyspansmallstructuralvariationsof1-2ÅbackboneRMSD25.Rosetta'srelaxprotocol(unconstrained)21,22orKICprotocol14–16aresuggestedtogeneratedockingensemblesforantigens.Inaddition,RosettaAntibodycreatesensemblesofantibodiesbydefault.MoreonhowtogeneratinganddockingensemblescanbefoundinChauduryandGray25andinRosetta’sdocumentation37.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 27: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

27

Box2|Assessingantibodymodelingandantibody–antigendockingresults

Theusermustcriticallyanalyzecomputationalmodels.ModelsoutputbyRosettashouldberankedaccordingtoscore.Inmostsimulations,approximately90%ofthemodelswillbenon-native-like,andtheywilloccupyabulkscorerangeof30-50RosettaEnergyUnits(REU).Onlyabout1-5%ofmodelswillbenative-like,andtheywillhavescoresrangingfromwithinthebulkscorerangetoawellof5-10REUbelowthebulkscorerange.Ifaclusterofstructurallyrelatedmodelslieinthewellbelowthebulkscorerange,thesearelikelynative-like,withdeeperscoringandmorepopulatedwellsprovidinghigherconfidence.

Antibodymodelassessment.

First,assessthephysicalfeasibilityofthelowest-scoringmodelsbyeyeinamolecularvisualizationpackagesuchasPyMOL.Itisimportanttocheckforobviousflawsthatcanoccurinrarecircumstancessuchaspolypeptidechainbreaksorbackboneclashes,particularlywithintheCDRsandattheirgraftpoints.Theaccuracyofthenon-H3CDRloopsshouldbefurtherassessedbycomparingtheCDRclusterofthegraftedloopwiththeclusteroftheinputsequenceasidentifiedbyNorthetal.36(seestep5).Next,ensurethecomponentsoftheVL–VHorientationliewithinnature’sdistribution(step9),asmodelswithoutlyingorientationsarenotlikelytobenative-like;anexceptiontothisrulecanbemadeiftheVL–VHorientationgraftingtemplatesandRosettasamplingallliefartowardtheedgeofnature’sdistribution.

Ab–Agdockingmodelassessment.

Makesurethatthelowest-scoringmodelsmakegoodcontactsbetweentheantigenandtheantibodyparatope.Higherconfidencecanbeassignedtomodelswithlarge(~1200Å2),complementaryinterfaces,26aswellasthoseinwhichtheH3CDRloopmakesseveralspecificcontacts.Ifexperimentaldatasuggestanantigenbindingsite,ensuretheparatopecontactsatthissite.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint

Page 28: Modeling and docking antibody structures with Rosetta · Analytical and Physical Chemistry, Showa University School of Pharmacy, Tokyo 142-8555, Japan. 4 Centre for Immune Regulation,

28

Box3|UsingRosettaondifferentplatformsandrunninginparallel.

Rosettaondifferentplatforms.Throughoutthisprotocolexecutablesaresuffixedbytheplatformandmodeforwhichtheywerecompiled(i.e.antibody.macosclangreleaseindicatesthattheantibodyexecutablewascompiledonaMacOSoperatingsystemusingtheClangcompileranditwascompiledinreleasemode).Thesuffixishighlightedinorangethroughout(.macosclangrelease).Onotherplatformsyouwillreplacethisstringwithyouroperatingsystemandcompiler(forexample,GNU/Linuxplatformswithgccasthecompilerwilldefaultto.linuxgccrelease).Additionally,thesuffixisprefixedby.mpi (.mpi.linuxgccrelease)whentheexecutableisbuiltforthemessagepassinginterface(MPI)byanMPIcompiler.MPI-compatibleexecutablescancommunicatewithoneanotherforparallelprocessing,andsomeRosettaexecutablesuseMPInon-trivially.However,moststandardRosettaapplicationsaretriviallyparallelizable(“embarrassinglyparallel”)andthuscapableofrunningonbothMPIandnon-MPIsystems.Runninginparallel.Anexampleofhowtolocallyrunanon-MPIexecutableinparallelisgiveninstep8.Ingeneral,addthe-multiple_processes_writing_to_one_directoryflagtoyourcommandline,andthenexecutemultipleinstancesoftheprocess.ThisprocedureworksonasingledesktopcomputerwithmultipleCPUsorremotelyonasupercomputercluster.However,runningaRosettaexecutableonaclusterstronglydependsonthehardwareconfigurationandavailablesoftware(e.g.workloadmanagementsoftware).

Forexample,torunanon-MPIexecutableviaHTCondor:(1)savethestandardcommandlineasanexecutablebashscript,(2)writeasubmitdescriptionfilespecifyingtheexecutablebashscriptandthenumberofprocessestoexecute,and(3)usethecondor_submitcommandwiththedescriptionfileasanargumenttosubmityourjobstothecluster.

Ontheotherhand,MPIexecutablescanberuninparallellocallybyprependingthecommandlinewiththempirun –n XX command,whereXXisthenumberofprocessestorun,ifyourmachineisconfiguredtousetheOpenMPIlibrary.Again,theexactdependonthespecificclusterconfiguration.Forexample,torunanMPIexecutableonStampedeviatheslurmworkloadmanager:(1)savethestandardcommandlineasanexecutablebashscript,(2)writeaslurmbatchscriptspecifyingtheexecutablebashscriptandthenumberoftasks,and(3)usethesbatchcommandwiththebashscriptasanargumenttosubmityourjobstothecluster.

.CC-BY-NC 4.0 International licensecertified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which was notthis version posted August 16, 2016. . https://doi.org/10.1101/069930doi: bioRxiv preprint