Rapport d'HathiTrust sur un plan de sauvegarde des données informatiques en cas de sinistre.

download Rapport d'HathiTrust sur un plan de sauvegarde des données informatiques en cas de sinistre.

of 61

Transcript of Rapport d'HathiTrust sur un plan de sauvegarde des données informatiques en cas de sinistre.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    1/61

    HathiTrust

    isa

    Solution

    TheFoundationsofa

    DisasterRecoveryPlanfortheShared

    DigitalRepository

    Thisreportservesas

    recommendationsmadeby

    MichaelJ.Shallcross,

    2009DigitalPreservationIntern

    UniversityofMichigan

    SchoolofInformation

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    2/61

    ii

    ExecutiveSummary

    ThisreportseekstoestablishtheframeworkofaDisasterRecoveryPlanfortheHathiTrust

    DigitalLibrary.Whileprofessionalbestpracticesandinstitutionalneedshaveprovidedaclearmandate

    forHathiTrustsDisasterRecoveryProgram,commonparlancehasoftenobscuredtwoprominent

    featuresofsuchinitiatives.First,aDisasterRecoveryPlanisactuallycomprisedofasuiteofdocuments

    whichdetailarangeofissues,fromcrisiscommunicationsandthecontinuityofadministrativeactivities

    totherestorationofhardwareanddata.Second,thereisnoconclusiontotheplanningprocess;itis

    insteadacontinuouscycleofobservation,analysis,solutiondesign,implementation,training,testing,

    andmaintenance.

    Theprimarygoalofthepresentdocumentistoprovideafoundationonwhichfutureplanning

    effortsmaybuild.Tothatend,itexaminesthestrategiesbywhichHathiTrusthasanticipatedand

    mitigatedtherisksposedbytencommonscenarioswhichcouldprecipitateadisaster:

    o Hardwarefailureanddatalosso Networkconfigurationerrorso Externalattackso Formatobsolescenceo Coreutilityorbuildingfailureo Softwarefailureo Operatorerroro Physicalsecuritybreacho Mediadegradationo Manmadeaswellasnaturaldisasters.

    Asthislistreveals,adisasterwithinthedigitalrepositoryrefersnotmerelytodataloss,thedestruction

    ofequipment,ordamagetoitsenvironment,buttoanyeventwhichhasthepotentialtocausean

    extendedserviceoutage.Foreachscenario,thereportdiscussespossiblethreats,summarizesthe

    potentialseverityofrelatedevents,andthendetailssolutionsHathiTrusthasenactedthroughdirectquotationsfromtheHathiTrustWebsiteandTRACselfassessment,ServiceLevelAgreements,and

    literaturefromserviceprovidersandvendors.Attachedappendicesproviderelevantinformationand

    includecontactsforimportantHathiTrustresources,anannotatedguidetoDisasterRecoveryPlanning

    references,andanoverviewofkeystepsintheDisasterRecoveryPlanningprocess.

    TheconcludingsectionofthereportprovidesrecommendationsandactionitemsforHathiTrust

    asitproceedswithitsDisasterRecoveryInitiative.ThesearedividedintoShort(06mos.),Intermediate

    (612mos.)andLongTerm(12+mos.)objectivesandarearrangedinasuggestedorderof

    accomplishment.

    o Shorttermgoalsinclude: DescribingthenatureandextentofHathiTrustsinsurancecoverage Testingandvalidationofcurrenttapebackupprocedures Improvedphysicalandintellectualcontroloversystemhardware Establishment,distribution,andmaintenanceofphonetrees Increaseddocumentationofinstitutionalknowledge IdentificationofDisasterRecoverymeasuresinplaceattheIndianapolissite.

    o Intermediatetermobjectivesfocuson: CreationofaDisasterRecoveryPlanningCommittee

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    3/61

    iii

    Initiationofthedatacollectionandanalysisessentialtothecreationofrecoverystrategies(Thissectionprovidesahighlevelbreakdownofvarioustasksand

    includesthecoordinationofactivitiesbetweentheAnnArborandIndianapolis

    sitesaswellaswithserviceprovidersandvendors.)

    o Longtermactionitemsdealwith: CompletionandimplementationofthesuiteofDisasterRecoverydocuments Initiationofstafftrainingandtestsoforganizationalcompliance. Storageofanadditionalcopyofbackuptapesataremotethirdlocation InvestigationofanalternatehotsiteinAnnArborintheeventadisaster

    renderstheMACCunusable

    Considerationofathirdinstanceoftherepository Avoidanceofvendorlockinifakeysuppliershouldgooutofbusiness.

    Thisreportdemonstratesthatvariousriskmanagementstrategies,designelements,operating

    procedures,andsupportcontractshaveendowedHathiTrustwiththeabilitytopreserveitsdigital

    contentandcontinueessentialrepositoryfunctionsintheeventofadisaster.Theestablishmentofthe

    Indianapolismirrorsite,theperformanceofnightlytapebackupstoaremotelocation,andthe

    redundantpowerandenvironmentalsystemsoftheMACCreflectprofessionalbestpracticesandwillenableHathiTrusttoweatherawiderangeofforeseeableevents.Unfortunately,disastersoftenresult

    fromtheunknownandtheunexpected;whiletheaforementionedstrategiesarecrucialcomponentsof

    aDisasterRecoveryPlan,theymustbesupplementedwithadditionalpoliciesandprocedurestoensure

    that,comewhatmay,HathiTrustwillbeabletocarryonasbothanorganizationandadedicatedservice

    provider.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    4/61

    iv

    Acknowledgements

    TheauthorwouldliketothankShannonZacharyforherencouragementandguidance;Cory

    SnavelyandJeremyYorkfortheirgenerousexpenditureoftime,energy,andknowledge;andNancy

    McGovernandLanceStuchellforaccesstotheiroutstandingDisasterRecoveryPlanningresources.The

    followingindividualshavealsobeeninvaluablesourcesofadvice,support,andinformation:JohnWilkin,

    BobCampe,CyndiMesa,AnnThomas,JohnWeise,LarryWentzel,LaraUngerSyrigos,BillHall,Emily

    Campbell,SebastienKorner,JessicaFeeman,PhilFarber,ChrisPowell,CameronHanover,Stephen

    Hipkiss,TimPrettyman,ReneGobeyn,andKrystalHall.ThanksalsotoDr.ElizabethYakel,MagiaKrause,

    andVeronicaandCoraFambrough.TheworkinthisreportwasmadepossiblebyanIMLSGrant.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    5/61

    v

    TableofContents

    ExecutiveSummary p.ii Acknowledgements p.iv Introduction p.1

    o GoalsforHathiTrustsDisasterRecoveryProgram p.1o TheMandateforDisasterRecoveryPlanninginDigitalPreservation p.2o DisasterPreparednessintheDesignandOperationofHathiTrust p.2o EssentialHathiTrustBusinessFunctions p.3

    HathiTrustsDisasterRecoveryStrategies p.5o BasicRequirementsforDisasterRecovery p.5o DisasterRecoveryStrategy#1:RedundancybetweentheAnnArborandIndianapolisSitesp.5o DisasterRecoveryStrategy#2:NightlyAutomatedTapeBackups p.6

    Scenario1:HardwareFailureorObsolescenceandDataLoss p.8o Review:RisksInvolvingHardwareFailureorObsolescenceandDataLoss p.8o HathiTrustsSolutionsforHardwareFailureandDataLoss p.8o RedundantComponentsandSinglePointsofFailureintheHathiTrustInfrastructure p.9o KeyFeaturesofHathiTrustsIsilonIQClusteredStorage p.10o HardwareSupportandService p.12o EquipmentTracking p.13o HardwareReplacementSchedule p.13o TimelineforEmergencyReplacementofHathiTrustInfrastructure p.13o HathiTrustandInsuranceCoverageattheUniversityofMichigan p.14

    Scenario2:NetworkConfigurationErrors p.15o Review:RisksInvolvingNetworkConfigurationErrors p.15o

    HathiTrustsSolutionsforNetworkConfigurationErrors p.15o ExtentofITComSupport p.15o ITComResponsibilities p.16o ITComServicesinResponsetoOutagesorDegradationImpactingtheNetwork p.16o HathiTrustResponsibilities p.16

    Scenario3:NetworkSecurityandExternalAttacks p.17o Review:RisksInvolvingNetworkSecurityandExternalAttacks p.17o HathiTrustsSolutionsforNetworkSecurity p.17

    Scenario4:FormatObsolescence p.18o Review:RisksInvolvingFormatObsolescence p.18o HathiTrustsSolutionsforFormatObsolescence p.18o SelectionofFileFormats p.18o FormatMigrationPoliciesandActivities p.19

    Scenario5:CoreUtilityand/orBuildingFailure p.20o Review:RisksInvolvingCoreUtilityorBuildingFailure p.20o HathiTrustsSolutionsforUtilityorBuildingFailure p.20o GeneralMaintenanceandRepairsinUniversityofMichiganFacilities p.20o TheMichiganAcademicComputingCenter(MACC) p.20o ArborLakesDataFacility(ALDF) p.22

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    6/61

    vi

    Scenario6:SoftwareFailureorObsolescence p.23o Review:RisksInvolvingSoftwareFailureorObsolescence p.23o HathiTrustsSolutionsforSoftwareIssues p.23

    Scenario7:OperatorError p.24o Review:RisksInvolvingOperatorError p.24o HathiTrustsSolutionsforOperatorError p.24o Ingest p.24o ArchivalStorage p.24o Dissemination p.24o DataManagement p.24

    Scenario8:PhysicalSecurityBreach p.25o Review:RisksInvolvingaPhysicalSecurityBreach p.25o HathiTrustsSolutionsforPhysicalSecurity p.25o SecurityattheMACC p.25o SecurityattheALDF p.26

    Scenario9:NaturalorManmadeDisaster p.27o Review:RisksInvolvingaNaturalorManmadeDisaster p.27o HathiTrustsSolutionsforNaturalorManmadeCatastrophicEvents p.27o BasicDisasterRecoveryStrategies p.28

    Scenario10:MediaFailureorObsolescence p.29o Review:RisksInvolvingMediaFailureorObsolescence p.29o HathiTrustsSolutionsforMediaFailure p.29o RemainingVulnerabilities p.29

    ConclusionsandActionItems p.30o Conclusions p.30o ShortTermActionItems p.30o IntermediateTermActionItems p.31o LongTermActionItems p.32

    APPENDIXA:ContactInformationforImportantHathiTrustResources p.34 APPENDIXB:HathiTrustOutagesfromMarch2008throughApril2009 p.37 APPENDIXC:WashtenawCountyHazardRankingList p.38 APPENDIXD:AnnotatedGuidetoDisasterRecoveryPlanningReferences p.39 APPENDIXE:OverviewoftheDisasterRecoveryPlanningProcess p.45 APPENDIXF:TSMBackupServiceStandardServiceLevelAgreement(2008) p.52 APPENDIXG:ITCS/ITComCustomerNetworkInfrastructureMaintenanceStandardService

    Agreement(2006) p.53

    APPENDIXH:MACCServerHostingServiceLevelAgreement(Draft,2009) p.54 APPENDIXI:MichiganAcademicComputingCenterOperatingAgreement(2006) p.55

    **AppendicesFIareembeddedPDFfiles.**

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    7/61

    20090824 1

    Introduction

    Intherealmofprintlibraries,adisasterisafairlyunambiguousevent:itisafire,abrokenpipe,

    aninfestationofpestsinshort,anythingwhichthreatensthecontinueduseandexistenceoftextsor

    theenvironmentinwhichtheyarestored.Thisbasicdefinitionmayalsobeappliedtothedigitallibrary,

    inwhichadisasterrefersnotmerelytothelossofcontentorcorruptionofdata,thedestructionofequipmentordamagetoitsenvironment,buttoanyeventwhichhasthepotentialtocausean

    extendedserviceoutage.Thislastpartprovestobethegreatestdifferencebetweentheprintand

    digitalworldsbecausethereareagreatmanythreatswhichcanleavedataintactbutincapacitatethe

    primaryfunctionsofadigitallibrary.ThedailyoperationofaninstitutionsuchasHathiTrustinvolvesthe

    anticipationandresolutionofavarietyofproblemscrashedservers,softwarebugs,networkingerrors,

    etc.whichonlyrisetothelevelofadisasterwhentheyexceedthecapacityofnormaloperating

    proceduresand/orthemaximumallowableoutageperiods.DisasterRecoveryPlanningthuspromptsus

    todeveloprobuststrategiestomitigateandlimittheeffectsofcommonproblemsandatthesametime

    forcesustothinktheunthinkable.Nevertheless,confrontingworstcasescenariosisavitalactivity;the

    beliefthataneventwillneverhappensimplybecauseithasneverhappenedisaninvitationtothevery

    disasterweseektoavoid.Hereinliesaconundrum,inthatthecreationofdetailedplansforevery

    eventualityisnearlyimpossibleandalsoimpractical,sincetheresultsofsuchanendeavorwouldbe

    needlesslycomplexaswellasexpensive.Atitsbasis,then,DisasterRecoveryPlanningdemandsan

    astuteassessmentofrisksothatwemayweighthecostsofpreparationsandsolutionsagainstthecosts

    ofapotentialevent.

    Sowheretobegin?WhenthesubjectofDisasterRecoveryPlanningarises,commonparlance

    oftenobscurestwoprominentfeaturesofsuchinitiatives.First,aDisasterRecoveryPlanisactually

    comprisedofasuiteofdocumentswhichdetailavarietyofrelatedissues,fromcrisiscommunications

    andthecontinuityofadministrativeactivitiestotherecoveryofhardwareanddataandtherestoration

    ofcorefunctions.Second,thereisnoconclusiontotheplanningprocessorapointatwhichaplanis

    done;thereisinsteadacontinuouscycleofobservation,analysis,solutiondesign,implementation,

    training,testing,andmaintenance.Theessentialfirststepisthereforeathoroughknowledgeofthe

    organization,itsgoals,anditsmandateforaDisasterRecoveryProgramsothatlatereffortscanfocusonthearticulationofpoliciesandthedevelopmentofsolutions.Asapreliminarystepinthiseffort,this

    reportlookstoestablishabasicfoundationfromwhichfutureplanningeffortsmaygrow.

    GoalsforHathiTrustsDisasterRecoveryProgram WhileamoreformalstatementofHathiTrustsgoalsandrequirementsforitsDisasterRecovery

    Programmustbeelucidated,therepositorysmissionstatementprovidesagoodindicationofitsmain

    objectiveintheformationofaDisasterRecoveryPlan.Aspartofitsaimtocontributetothecommon

    goodbycollecting,organizing,preserving,communicating,andsharingtherecordofhuman

    knowledge,HathiTrustseekstohelppreservetheseimportanthumanrecordsbycreatingreliableand

    accessibleelectronicrepresentations.

    1

    Thisstatementclearlyjoinsthetwinimperativesofpreservationandaccesswithanadditionalrequirement:reliability.Thedevelopmentandimplementationofa

    DisasterRecoveryPlanwillensurethatdigitalobjectswillretaintheirauthenticityandintegrityoverthe

    longtermandthatpartnerlibrariesanddesignatedusersmayrelyonHathiTrustservices(ortheirtimely

    resumption)andcontentinthefaceofcatastrophicevents.

    1HathiTrust.Mission&Goals(2009)retrievedfromhttp://www.hathitrust.org/mission_goalson8July2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    8/61

    20090824 2

    TheMandateforDisasterRecoveryPlanninginDigitalPreservation HathiTrustsmandateforacomprehensiveandproactiveDisasterRecoveryPlanstemsfroma

    numberofsignificantsources,amongwhichwemayincludeitsmissionandgoals.TheInstitutional

    DataResourceManagementPolicy(2008)oftheUniversityofMichigansStandardPracticeGuidealso

    providesanimpetusforthecreationofaDisasterRecoveryProgram.Whilenotnecessarilyinclusiveof

    theMichiganDigitizationProjectmaterialsstoredinHathiTrust,thisdocumentunderscoreshow

    importantitisthatdataresourcesbesafeguarded[and]protectedandcontingencyplans[]be

    developedandimplemented.2Initsdiscussionofthelatterpoint,thepolicyspecifiesthat:

    DisasterRecovery/BusinessContinuityplansandothermethodsofrespondingtoanemergency

    orotheroccurrencesofdamagetosystemscontaininginstitutionaldata[]willbedeveloped,

    implemented,andmaintained.Thesecontingencyplansshallinclude,butarenotlimitedto,

    databackup,DisasterRecovery,andemergencymodeoperationsprocedures.Theseplanswill

    alsoaddresstestingofandrevisiontodisasterrecovery/businesscontinuityproceduresanda

    criticalityanalysis.3

    Whiledatabackupproceduresandahostofriskmanagementpracticesarealreadyanintegralpartof

    HathiTrustsoperation,therepositorynowlookstoformalizetheotherstrategiessuggestedbythe

    InstitutionalDataManagementPolicy.Beyondtheexamplelaidoutbythisdocument,HathiTrusts

    mandateforDisasterRecoveryderivesfromtheprofessionalliteraturedetailingbestpracticesinthe

    fieldofdigitalpreservation.TheReferenceModelforanOpenArchivalReferenceSystemidentifies

    DisasterRecoveryasanessentialcomponentofitsArchivalStoragefunctionandhighlightsthe

    importanceofsuchplansinachievingthegoaloflongtermpreservationofadigitalarchivesholding.As

    outlinedintheOAISdocument,theDisasterRecoveryfunctionprovidesamechanismforduplicating

    thedigitalcontentsofthearchivecollectionandstoringtheduplicateinaphysicallyseparatefacility.4

    HathiTrusthassuccessfullymetthisrequirementbyperformingnightlytapebackupsandestablishinga

    mirrorsiteatIndianaUniversityinIndianapolis.TheTrustedRepositoriesAudit&Checklist:Criteriaand

    Checklist(2007)isevenmoreexplicitinitsrequirementthatrepositoriesdocumenttheirpoliciesand

    procedureswithsuitablewrittendisasterpreparednessandrecoveryplan(s),includingatleastoneoff

    sitebackupofallpreservedinformationtogetherwithanoffsitecopyoftherecoveryplan(s).5

    Professionalbestpracticesaswellasinternalneedsandgoalsthusprovidethemandatewhichunderlies

    HathiTrustsdevelopmentofaformalDisasterRecoveryPlan.

    DisasterPreparednessintheDesignandOperationofHathiTrust OneoftheprimarygoalsofHathiTrustistoprovidetransparencyinallofitsoperations,

    includingitsworktocomplywithdigitalpreservationstandardsandreviewprocesses.6Nowhereisthis

    commitmentmoreclearthaninitseffortstoanticipateandmitigateriskswhichcouldthreatenthe

    2UniversityofMichigan.InstitutionalDataResourceManagementPolicy(2008)StandardPracticeGuide,

    retrievedfromhttp://spg.umich.edu/on8July2009.3Ibid.4ConsultativeCommitteeforSpaceDataSystems.ReferenceModelforanOpenArchivalInformationSystem

    (2002)p.48.5OCLCandCRL.SectionC3.4TrustedRepositoriesAudit&Checklist:CriteriaandChecklist(2007)p.49.6HathiTrust.Accountability(2009)retrievedfromhttp://www.hathitrust.org/accountabilityon25June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    9/61

    20090824 3

    contentsandfunctionsoftheSharedDigitalRepository.Asafirststepinaddressingthedisaster

    preparednessrequirementinsectionC3.4oftheTRACCriteriaandChecklist,7thisdocumentservestwo

    purposes.First,itprovidesanoverviewofthepolicies,procedures,resourcesandcontractsthatenable

    HathiTrusttoaddressthechallengesandthreatsendemictothefieldofdigitalpreservation.Materialis

    thereforeciteddirectlyfromtheHathiTrustWebsite(http://www.hathitrust.org),themostrecent

    versionofHathiTrustsreviewofitscompliancewiththeminimumrequiredelementsoftheTRAC

    CriteriaandChecklist,8andrelevantliteratureprovidedbykeyvendorsandserviceproviders.9Second,

    thisreportexaminesHathiTrustscurrentlevelofdisasterpreparednessanddefinescurrentand

    forthcomingeffortsinitsdevelopmentofadynamicandproactiveDisasterRecoveryProgram.Perthe

    recommendationsoftheTRACCriteriaandChecklist,thisdocumentrecordsthemeasuresand

    precautionsalreadyinplaceinregardstospecifictypesofdisastersthatcouldbefallHathiTrust.These

    eventsincludehardwarefailure,dataloss,networkconfigurationerrors,externalattacks,coreutility

    failure,formatobsolescence,softwarefailure,physicalsecuritybreach,andmanmadeaswellasnatural

    disasters.Whileaformal,writtenplandetailingindividualrolesandresponsibilitiesintherepositorys

    responsetoeachofthesescenariosisstillforthcoming,theevidencegatheredinthisreportrevealsthat

    crucialelementsofaDisasterRecoveryPlanarealreadyinplacewithinHathiTrust.10

    EssentialHathiTrustBusinessFunctionsAsthedevelopmentoftheDisasterRecoveryPlanproceeds,itisimportanttobearinmindthat

    itsgoalisnotmerelytherestorationofhardwareanddatabutalsotherecoveryandcontinuityof

    essentialrepositoryfunctions.Thefollowinglistrepresentscorefunctionsthatneedtobeaddressedby

    HathiTrustsDisasterRecoveryPlanandassuchshouldnotbeconsideredacomprehensive

    representationoftherepositorysfunctions.Bydirectingplanningeffortstowardspecificfunctions

    (ratherthantheorganizationsactivitiesasawhole),HathiTrustmayprioritizeandfocusitsrecovery

    responsesandresourcestoensurethatthemostessentialfunctionsgobackonlinefirst.Subsequent

    discussionofDisasterRecoverystrategiesandriskmanagementsolutionsinthisreportarepresented

    undertheassumptionthatthecontinuityofthesefunctionsisaprimaryobjective.Theprioritizationof

    thesefunctionsremainstobedeterminedbyanappropriateauthority.11

    7Repositoryhassuitablewrittendisasterpreparednessandrecoveryplan(s),includingatleastoneoffsitebackup

    ofallpreservedinformationtogetherwithanoffsitecopyoftherecoveryplan(s).Therepositorymusthavea

    writtenplanwithsomeapprovalprocessforwhathappensinspecifictypesofdisaster(fire,flood,system

    compromise,etc.)andforwhohasresponsibilityforactions.Thelevelofdetailinadisasterplanandthespecific

    risksaddressedneedtobeappropriatetotherepositoryslocationandserviceexpectations.Fireisanalmost

    universalconcern,butearthquakesmaynotrequirespecificplanningatalllocations.Thedisasterplanmust,

    however,dealwithunspecifiedsituationsthatwouldhavespecificconsequences,suchaslackofaccesstoa

    building.OCLCandCRL.TrustedRepositoriesAudit&Checklist:CriteriaandChecklist(2007)p.49.8HathitrustDigitalLibraryReviewofCompliancewithTrustworthyRepositoriesAudit&Certification:Criteriaand

    ChecklistMinimumRequiredElements,revisedMay20,2009.Availableat

    http://hathitrust.org/documents/trac.pdf9ContactinformationforrelevantUniversityofMichigandepartmentsandserviceprovidersaswellasforexternal

    vendorsmaybefoundinAppendixA.10AlistofresourcesrelatedtodisasterrecoveryandtheplanningprocessmaybefoundinAppendixD(Annotated

    ListofDisasterRecoveryPlanningResources).11ThislistofessentialHathiTrustbusinessfunctionswasdevelopedinconjunctionwithJeremyYork.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    10/61

    20090824 4

    o Ingest Ingestdigitalobjects(SIPs)viaGRINtheGoogleReturnInterface(ora

    modifiedingestportalforlocalcontent)

    ValidateingestedcontentwithGROOVEtheGoogleReturnObjectOrientedValidationEnvironment(oramodifiedversionforlocalizedingest)

    o ArchivalStorage Preserveindefinitelydigitalobjectsandmetadata(AIPs)intheSharedDigital

    Repository(includesensuringtheintegrityandauthenticityofmaterials).This

    functionaddressestheneedsofpartnerlibrariesaswellasindividualusers.

    Recordchangestoandactionsonitemswhiletheyareintherepository Maintainapersistentobjectaddressforitemswithinrepository

    o Dissemination Provideaccesstodigitalobjectsforusers Allowforthetextsearchesthroughavarietyoffields Enablelargescalefulltextsearches Permitthecreationofpublicandprivatecontentcollections Disseminatedigitalobjects(DIPs)tousers(viathepageturneraccesssystem

    anddataAPI)

    DistributedatasetsandHathiTrustAPIstodevelopers ResearchanddevelopadditionalapplicationsandresourcesforHathiTrust

    o Administration Providetransparentanduptodateinformationtousersandthegeneralpublic

    viahttp://www.hathitrust.org/

    CommunicateinformationandcoordinateactivitiesamongstpartnerlibrariesandHathiTrustboardsandcommittees.

    o DataManagement UpdateandmanagetheRightsandGeoIPdatabases BuildandmaintainCollectionBuilderandLargeScaleSearchSolrindexes Determineappropriateuseraccesstotextsviadatabasequeries SynccontentwiththeIndianapolissiteandbackupcontenttotape

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    11/61

    20090824 5

    HathiTrustsDisasterRecoveryStrategies

    BasicRequirementsforDisasterRecovery RoyTennanthasidentifiedthreerequisitecomponentsofadigitalDisasterRecoveryPlan:(1)

    theuseofaneffectivedataprotectionsystem(i.e.RAID),(2)redundantpowerandenvironmentalsystems,and(3)regularbackupofinformationtotapeand,ideally,toaremotemirroredsite.12

    HathiTrusthasincorporatedalltheseelementsintoitsdesignandoperation.ItsIsilonIQstoragecluster

    providesahighdegreeofdataredundancywithitsN+3parityprotection;theMichiganAcademic

    ComputingCenterprovidesfullyredundantpowerandenvironmentalsystemsforHathiTrust

    infrastructure;andnightlytapebackupsandthereplicationofdatatoafullyoperationalmirrorsite

    locatedatIndianaUniversityinIndianapoliswiththesamelevelsofpowerandenvironmental

    conditioningprovidemultiplecopiesaswellasgeographicdistributionofcontent.

    o HathiTrustisintendedtoprovidepersistentandhighavailabilitystoragefordepositedfiles.Inordertofacilitatethis,theinitiativestechnologyconcentratesoncreatinga

    minimumoftwosynchronizedversionsofhighavailabilityclusteredstoragewithwide

    geographicseparation(thefirsttwoinstancesofstoragewillbelocatedinAnnArbor,

    MIandIndianapolis,IN),aswellasanencryptedtapebackup(writtentoandstoredina

    separateAnnArborfacility).

    Eachofthesestorageortapeinstancesisphysicallysecure(e.g.,inalockedcageina

    machineroom)andonlyaccessibletospecifiedpersonnel.Eachseparatestorage

    systemisalsoequippedwithmechanismstoprovidemirroredmanagementandaccess

    functionality,andemploy100%dataredundancyinanefforttopreventdataloss.13

    DetailsonparityprotectionandtheHathiTrustserverenvironmentareavailablebelow(seeScenario1

    andScenario5,respectively).

    DisasterRecoveryStrategy#1:RedundancybetweentheAnnArborandIndianapolisSites HathiTrust'sfirstlineofdefenseintheeventofadisasterisitshotmirrorsiteinIndianapolis.

    WhileingestofmaterialisrestrictedtotheAnnArborlocation,bothsitespossesstwowebservers,a

    MYSQLdatabaseserver,andanIsilonIQstoragecluster(currentlycomposedof21nodes,servers

    composedofCentralProcessingUnitsaswellasstorage).Duringnormaloperations,thisarrangement

    allowsHathiTrusttobalanceahighvolumeofwebtrafficacrossbothsitessuchthatindividualuser

    requestsmaybehandledbyeithersiteinatransparentmanner.Shouldthetolerancesforfailurebe

    exceededatasite(asinadisastersituation)thefailovercapabilitybuitintotheHathiTrustarchitecture

    enablestheremainingsitetoprovideaccesstothedesignatedcommunitywithoutnoticeableservice

    disruptions.AsnotedintheMay2009HathiTrustUpdate,withthefulloperationofbothlocations,We

    arenowensuringthatusersdonotfeeltheeffectsofsinglesiteoutages,suchasroutinemaintenance,

    12Tennant,Roy.DigitalLibraries:CopingwithDisasters.LibraryJournal,15November2009.Retrievedfrom

    http://www.libraryjournal.com/article/CA180529.htmlon13July2009.13HathiTrust.Technologyretrievedfromhttp://www.hathitrust.org/technologyon15June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    12/61

    20090824 6

    bytakingadvantageofsiteredundancy.14However,becauseingesttakesplaceonlyinAnnArbor,the

    lossofkeycomponentstherewouldinhibittherepositorysabilitytoacquirenewcontent.

    HathiTrustutilizesIsilonSystemsSyncIQApplicationSoftwaretosynchronizedataatthe

    IndianapolissitewithnewlyingestedorupdatedmaterialfromtheAnnArborsite.Thesyncto

    Indianapolisrunson24separatesubsetsofthedataandeachonerunsevery2hours,withthe

    exceptionofSundays.Inotherwords,subset1runsatmidnightonMonday,subset2runsat2a.m.,and

    soon.ThemaximumtimefordatatobereplicatedfromAnnArbortoIndianapoliswouldthereforebe

    threedaysplustheruntimeofthesyncprocess(whichtendstotakelessthanthreehours.)15

    o SyncIQisanasynchronousreplicationapplicationthatfullyleveragestheuniquearchitectureofIsilonIQstoragetoefficientlycopydatafromaprimaryclustertoone

    locatedatasecondarylocation.16

    o Allnodes[inboththesourceandtargetIsilonIQclusters]concurrentlysendandreceivedataduringreplicationjobsinrealtime,withoutimpactingusersreadingand

    writingtothesystem.17

    o Arobustwizarddrivenwebbasedinterfaceisfullyintegratedinto[Isilonsproprietary]OneFSmanagementtooltocontrolallthefunctionality,including

    scheduling,policysettings,monitoringandloggingofdatatransferredandbandwidth

    utilization.18

    o Onlyfilesthathavechangedwillbereplicatedtothetargetclusters.Thiswilloptimizetransfertimesandminimizebandwidthused.19

    o Intheeventthesecondarysystemisnotavailableduetoasystemornetworkinterruption,thereplicationjobwillbeabletorollbackandrestartatthelastsuccessful

    copyoperation.20

    o Uponacriticalfailureorlossofnetworkconnection,analertwillbesenttoallrecipientsconfiguredtoreceivecriticalalerts.21

    DisasterRecoveryStrategy#2:NightlyAutomatedTapeBackupsHathiTrustsabilitytorecoverfromadisasterisalsoensuredbythenightlyautomatedtape

    backupsperformedbytheTivoliStorageManager(TSM)clientapplicationinstalledontheingestservers

    connectedtotheHathiTruststorageclusterandmanagedbyMichigansITCSTSMGroup.TheTSM

    BackupServiceStandardServiceLevelAgreement22outlinestheobligationsandresponsibilitiesofboth

    theserviceproviderandHathiTrust:

    14HathiTrust.UpdateonMay2009Activities(2009)retrievedfrom

    http://www.hathitrust.org/updates_may2009on2July2009.15Snavely,Cory(Head,UMLibraryITCoreServices).Personalemailon13July2009.

    16BackupandRecoveryWithIsilonIQClusteredStorage,2007p.11

    17Ibid.

    18Ibid.

    19Ibid.

    20Ibid.

    21Ibid

    22PleaserefertoAppendixF(TSMBackupServiceStandardServiceLevelAgreement).

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    13/61

    20090824 7

    o TheprogressiveincrementalmethodologyusedbyTivoliStorageManageronlybacksupneworchangedversionsoffiles,therebygreatlyreducingdataredundancy,network

    bandwidthandstoragepoolconsumptionascomparedtotraditionalmethodologies

    basedonperiodicfullbackups.23

    o ITCSisresponsibleforallofthecentralserverhardware,tapehardware,networkinghardware,andrelatedcomponents.ITCSisalsoresponsibleforhardwaremaintenance

    aswellassoftwaremaintenance,administration,andsecurityauditsonthecentral

    (nonclient)TSMservers.(TSMBackupServiceSLA,sec.4.1)

    o ITCSprovides7x24oncallmonitoringandsupport,andstrivestokeeptheserversupinproductionatalltimes.Thetargetuptimeis99.9%ofthetime.TheTSMhardware

    designismodularandshouldallowustotakepiecesoutofservicewithoutaffecting

    customers.Wheneverpossible,systemmaintenancewillbeperformedduringstandard

    weekendmaintenancewindowsasdefinedbyITCS.(sec.4.2)

    o Inanemergency,[email protected](thiswillgototheoncallstaffspagerinrealtime).(sec.4.6)

    o ITCSisresponsibleforphysicalsecurity.Machineaccessaudits,OSsecurity,andnetworksecurityontheTSMserverendarealsotheresponsibilityofITCS.(sec.4.9)

    o Theservice[]includesdatacompression,dataencryptions,anddatareplication.(sec.1.0)

    o ITCSwillmaintainatleasttwoTSMsitesandwillmirrordatabetweenthesitestoprovideredundancyintheeventofadisaster.CurrentlythosesitesaretheArborLakes

    DataFacility(ALDF)at4251PlymouthRd.andtheMichiganAcademicComputingCenter

    (MACC)locatedat1000OakbrookDr.(sec.4.10)

    o Bothfacilitiesaresecure,climatecontrolledsitesdesignedandbuiltforhighavailableproductionservices.24

    o Intheeventofacustomerdisasterwithlargescale(afullserverormore)dataloss,ITCSwillworkwiththecustomertooptimizetherestoretimetobestofourability.We

    willonlybeabletodevoteresourcestotheextentthatothercustomersarenot

    affected.Restoringlargefileservers(multipleTerabytes)cantakeseveraldays.If

    customerswanttominimizethisamountoftimetorestore,wecanpurchaseadditional

    resourcesforthispurpose.Contactusdirectly,andwellworkoutascenariowith

    costinginformation.IntheeventofaMAJORcampusoutageaffectingalargenumberof

    customers,ITCSmanagementwillworkwithcustomerstodeterminehowtoprioritize

    customerrestores.(sec.4.11)

    o DisasterRecoveryplanningistheresponsibilityofthecustomerunit.(sec.5.8)HavingestablishedthemainDisasterRecoverystrategiesemployedbyHathiTrust,wemaynowproceed

    toinvestigatethemeansbywhichitanticipatesandmitigatesthemostcommonthreatsfacingdigital

    repositories.

    23IBM.IBMTivoliStorageManager:FeaturesandBenefits(2009)retrievedfromhttp://www

    01.ibm.com/software/tivoli/products/storagemgr/features.html?S_CMP=rnavon16June2009.24InformationTechnologyCentralServicesattheUniversityofMichigan.FrequentlyAskedQuestionsaboutthe

    TSMBackupService(2009)retrievedfromhttp://www.itcs.umich.edu/tsm/questions.phpon16June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    14/61

    20090824 8

    Scenario1:HardwareFailureorObsolescenceandDataLoss

    Review:RisksInvolvingHardwareFailureorObsolescenceandDataLoss Thefollowingtablehighlightsthevariouseventswhichposearisktothehardwareanddataof

    HathiTrust.Thesethreatsmaystemfromflawsormalfunctionsintheequipmentitselforasaresultofexternaleventsthatincludephysicalsecuritybreachesandnaturalormanmadedisasters.The

    arrangementofthesepotentialrisksreflectstherelativeseverityoftheirrespectiveconsequences.

    HathiTrustsSolutionsforHardwareFailureandDataLoss ThethreatsfacedbyHathiTrustshardware(andassociatedapplicationsaswellasthedata

    storedtherein)arecomprisedofthefailureofredundantfeatures,failurethatexceedscomponents

    toleranceforredundancy,andsinglepointsoffailure.Whilethefailureofredundantcomponentsmay

    happenmorefrequently(i.e.,thelossofanindividualdrivewithintheIsilonIQcluster),suchlossesdo

    nothavealargeimpactontherepository;eventswhichcompromisesinglepointsoffailurewillhave

    muchgreaterconsequencesforthecontinuityofHathiTrustoperations.Atthesametime,whilea

    componentmayhaveredundancyononelevel(forexample,therearefiveserversdedicatedtoingest),

    thatcomponentsimultaneouslymaybeconsideredatahigherleveltobeasinglepointoffailure(i.e.,

    becausetheingestserversarehousedinasinglechassis,theentireunitisvulnerabletoaneventsuch

    asafire).Thisdualityhighlightstheneedforvigilanceandforesightinmanagingtherepositorys

    infrastructure.

    BecauseHathiTrustreliesheavilyuponhardwaretofulfillitsmissionanddeliverservicestoits

    designatedcommunityofusers,theselectionofequipmentanddevelopmentofsystemarchitecture

    Severity Event

    Highimpact Lossatasinglepointoffailure

    Anadditionalfailurepasttoleranceswhenonlyonesiteisoperational Serviceisunavailableandcannotberestoreduntilcomponentisrepaired/restored

    ModerateImpact Failureofacomponentpastredundancytolerance

    Systemnolongerhasredundancy:additionallossorfailureofcomponentswillresultinlossofsystem.Thisisaparticularproblemifonesiteisalreadydown.

    Lossofdbserver(homeofRightsdb)orofbothWebserversatasitewillrenderthatlocationinaccessible LossoffourdrivesornodesineitherIsilonstorageclusterwillresultinthelossof

    thatinstance.Theclusterwillbeofflineandunabletohandlereadorwrite

    requests;alltrafficwouldhavetobehandledbytheremainingsite.

    LossofUMArborLakessitewouldpreventperformanceoftapebackups. LossofUMMACCsitewoulddepriveIUsiteofdataredundancy Lossofingestserverswouldpreventnewcontentfromenteringrepository

    LowImpact Failureofredundantsystemcomponents

    IncludesredundantcomponentswithineachsiteaswellasgeneralredundancybetweentheIUandUMsites

    o HTinfrastructurehasbeendesignedtoavoidsinglepointsoffailureandtoensuredataandequipmentredundancy

    o Servicecontinuesinanuninterruptedandtransparentmanner

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    15/61

    20090824 9

    hasaimedatminimizingthedangersposedbysinglepointsoffailurethroughtheintroductionof

    strategicredundancies.Thebasicmeansforavoidingthedisastrouseffectsofhardwarefailureordata

    losshavebeentheestablishmentoftheIndianapolismirrorsiteandthenightlybackupofcontentto

    tape.(Formoredetail,pleaserefertotheprecedingsection).Whilethesestrategiesaccountfor

    extraordinaryevents,HathiTrustsserverreplacementscheduleallowstherepositorytoanticipatethe

    resultsofnormalequipmentuseanddepreciation.Stepstosafeguardthelongtermfunctionalityof

    HathiTrusthavethereforebeencomplementedbyaconsiderationofbestpracticesfordisaster

    preparedness.

    RedundantComponentsandSinglePointsofFailureintheHathiTrustInfrastructureThefollowingsectionsprovideageneraloutlineofHathiTrustsredundantcomponentsand

    singlepointsoffailure.Giventhecomplexityoftherepositorysinfrastructure,unknownor

    unanticipatedscenariosmayexist;futureDisasterRecoveryPlanningwillthusinvolveaperiodicreview

    ofkeyfeaturesandvulnerabilities.

    o SiteRedundancy:TheestablishmentofthemirrorsiteinIndianaprovidesHathiTrustwithafullyredundantoperation.Becausebothinstancesprovidefullaccesstocontent

    inadditiontootherrepositoryfunctions,userswillnotexperiencealossordegradation

    ofserviceintheeventthatserviceislostfromonesite.KeyexceptionstoHathiTrusts

    siteredundancyarenotedbelow.

    o RedundantComponentsatEachSite:ThefollowingcomponentsprovideeachsitewithatoleranceunderwhichlimitedfailureswillnotdisruptmajorHathiTrustfunctionsand

    userservices.

    Webservers:eachsitehastwoserverssothatifonefails,theothermaycontinuetohandletraffic.ThesealsohosttheGeoIPdatabase.

    IsilonIQclusters:thecurrentconfigurationof21nodesfeaturesN+3parityprotection;thisdataredundancypermitsthesimultaneousfailureof3driveson

    separatenodesorthelossofthreeentirenodeswithoutservicedegradation.

    Ingestservers:theAnnArborsitepossessesfiveserverssothatingestmaycontinue(albeitataslowerrate)intheeventofanyfailures.

    LargeScaleSearch(LSS)Solrindex:currentlyhousedonthewebservers,butwillsoonbemaintainedonfivenewserversinAnnArbor.

    o SinglePointsofFailure:25Thesearecomponentsofasystemwhich,iflost,willpreventtheentiresystemfromfunctioning.Eventhosecomponentswithwhollyredundantpeer

    devices(suchastheweboringestservers)maybeconsideredsinglepointsoffailureif

    theyhaveexceededtheircapacitytosustainlosses(i.e.,ifonewebserveratasitehas

    alreadybeenlost).

    SinglePointsofFailureattheComponentLevel:BecauseonlyoneofthesecomponentsexistsateachHathiTrustsite,alosswillresultinsystemfailure.

    MYSQLdatabaseserver:housestherightsdatabase,ingesttrackingdatabase,andtheCollectionBuilderSolrindex

    Servernetworkswitches Outboundnetworkswitches

    SinglePointsofFailureattheSystemLevel:Whileanygivencomponentmayhavevariousdegreesofinternalredundancy(suchasmultiplepowersuppliesor

    25ContentinthissectioniscourtesyofCorySnavely(personalemailfrom13July2009).

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    16/61

    20090824 10

    multipledrives)itmightstillfailasawholeandthusresultinthelossofa

    particularinstanceofHathiTrust.Thefollowingarecomponentslocatedateach

    sitewhich,whilepossessedofinternalredundancies,arestillsubjectto

    completeloss(asintheeventofafire)andmaythusrenderasiteinoperable.

    IsilonIQstoragecluster:theentireclustercouldbelostinalargescaleevent.Additionally,thelossofafourthdriveornodewillexceedthe

    clustersfailuretoleranceandresultinaservicedisruption.

    Webservers:shouldonefail,theremainingserverwillbeasinglepointoffailure.

    Bladeserverchassis:sinceweb,ingest,anddatabaseserversarehousedinonechassis,theentireunitcouldpotentiallyfail.

    LSSindex:inthenearfuture,theserversinAnnArborwillbethesoleinstanceoftheLargeScaleSearchindex.

    MirlyndatabaseandMirlyn2Solrindex26:thesearecurrentlykeycomponentsoftheUMLibraryinfrastructure;shouldthesebe

    unavailable,accesstoanduseofHathiTrustwillbecompromised.

    KeyFeaturesofHathiTrustsIsilonIQClusteredStorage TheIsilonIQstorageclusterstoresandprovidesdigitalobjectsforHathiTrustspartnerlibraries

    andmembersofitsdesignatedcommunity.Theclusterprovidesahighdegreeofinherentredundancy,

    whichgivesbothHathiTrustsitesaconsiderabledegreeoftoleranceinregardstothefailureofvarious

    aspectsofthestorageunits.Asoneexample,IsilonsproprietaryOneFSoperatingsystempermitsthe

    individualstoragenodestheindividualserversthatarethebuildingblocksoftheclustertofunction

    ascoherentpeerssothatanyonenodeknowseverythingcontainedontheotherunitsinthecluster.

    o Isilon'sOneFSoperatingsystem[]intelligentlystripesdataacrossallnodesinaclustertocreateasingle,sharedpoolofstorage.27

    o Becauseallfilesarestripedacrossmultiplenodeswithinacluster,nosinglenodestores100%ofafile;ifanodefails,allothernodesintheclustercandeliver100%ofthe

    fileswithinthatcluster.28

    o Adistributedclusteredarchitecturebydefinitionishighlyavailablesinceeachnodeisacoherentpeertotheother.Ifanynodeorcomponentfails,thedataisstillaccessible

    throughanyothernode,andthereisnosinglepointoffailureasthefilesystemstateis

    maintainedacrosstheentirecluster.29

    26MirlynisthenameoftheUniversityofMichiganscurrentOnlinePublicAccessCatalog,whichissupportedby

    theAlephintegratedlibrarysystem.Mirlyn2isabetaversionofUMsrecentlyimplementednextgeneration

    catalog,basedontheVuFindplatform,whichwillbecomethemainlibrarycatalogonAugust3,2009.27IsilonSystems,Inc.IsilonIQOneFSOperatingSystem(2009)retrievedfrom

    http://www.isilon.com/products/OneFS.phpon17June2009.28IsilonSystems.UncompromisingReliabilitythroughClusteredStorage:DeliveringHighlyAvailableClustered

    StorageSystems(2008)p.7.Incomputerdatastorage,datastripingisthetechniqueofsegmentinglogically

    sequentialdata,suchasasinglefile,sothatsegmentscanbeassignedtomultiplephysicaldevices.[]ifonedrive

    failsandthesystemcrashes,thedatacanberestoredbyusingtheotherdrivesinthearray.

    (http://en.wikipedia.org/wiki/Data_striping,retrievedon16August,2009).29IsilonSystems.BreakingtheBottleneck:SolvingtheStorageChallengesofNextGenerationDataCenters

    (2008)p.8

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    17/61

    20090824 11

    HathiTrustsIsilonIQclustersensureahighdegreeofdataredundancywiththeirN+3parityprotection.

    N+3providestriplesimultaneousfailureprotectionsothatuptothreedrivesonseparateIsilonIQ

    nodes,orthreeentirenodes,canfailatthesametimeandalldatawillstillbefullyavailable.

    o TraditionalRAID5parityprotectionresultsindatalossifmultiplecomponentsfailpriortothecompletionofarebuild.FlexProtect,incontrast,automaticallydistributesall

    dataanderrorcorrectioninformationacrosstheentireIsilonclusterandwithitsrobust

    errorcorrectiontechniquesefficientlyandreliablyensuresthatalldataremainsintact

    andfullyaccessibleevenintheunlikelyeventofsimultaneouscomponentfailures.30

    o Eachfileisstripedacrossmultiplenodeswithinacluster,with[three]paritystripesforeachdatablock.31

    ThefilesystemmayalsoperformaDynamicSectorRepair(DSR)atthetimeofanyfilewriting.Ifit

    encountersabaddisksector,thefilesystemwilluseparityinformationelsewhereinthesystemto

    rebuildthenecessaryinformationandrewriteanewblockelsewhereelseonthedrive.Thebadsector

    willberemappedbythedrivesothatitisneverusedagainandthewriteoperationwillbecompleted.

    TheIsilonrestriperisametaprocess/infrastructurethathasfourprimaryphasestohelp

    manageandprotectdataintheeventthatcomponentsoftheclustersustainapartialfailureor

    malfunction.Theprocessesrunasbackgroundoperationsanddonotrequiresystemdowntime.3233

    o FlexProtectrepairsdata(i.e.,intheeventofadriveloss)usingparity. IsilonOneFSwithFlexProtectcanboasttheindustryleadingMeanTimeto

    DataLoss(MTTDL)forpetabyteclusters.34

    FlexProtectintroducesstateoftheartfunctionality,whichrebuildsfaileddisksinafractionofthetime,harnessesfreestoragespaceacrosstheentirecluster

    tofurtherinsureagainstdataloss,andproactivelymonitorsandpreemptively

    migratesdataoffofatriskcomponents.35

    o AutoBalancerebalancesthedatainaclusteraccordingtobusinessrules,inrealtime,nondisruptively.36

    Assoonasthe[neworrepaired]nodeisturnedonandnetworkcablesareconnected,AutoBalanceimmediatelybeginstomigratecontentfromthe

    existingstoragenodestothenewlyaddednodeacrosstheclusterinterconnect

    backendswitch,rebalancingallofthecontentacrossallnodesinthecluster

    andmaximizingutilization.37

    30IsilonSystems,Inc.IsilonIQOneFSOperatingSystem(2009)retrievedfrom

    http://www.isilon.com/products/OneFS.phpon30June2009.31IsilonSystems.UncompromisingReliabilitythroughClusteredStorage:DeliveringHighlyAvailableClustered

    StorageSystems(2008)p.732IsilonXSeriesSpecifications(productbrochure)

    33InformationontheIsilonrestripercomesfromapersonalemailsentbyKipCranfordofIsilonSystems,Inc.on1

    June2009.34IsilonSystems.DataProtectionforIsilonScaleOutNAS(2009)p.4

    35IsilonSystems,Inc.IsilonIQOneFSOperatingSystem(2009)retrievedfrom

    http://www.isilon.com/products/OneFS.phpon15June2009.36McFarland,Anne.IsilonAcceleratesDeliveryofDigitalContentTheClipperGroupNavigator(2003).

    37IsilonSystems.TheClusteredStorageRevolution(2008)p.13

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    18/61

    20090824 12

    o Collectcleansuporphanednodesanddatablockstopreventfragmentationofdata.o MediaScanverifiesdisksectors.

    ThefunctionofMediaScanistoscaneveryblockinthefilesystemlookingforbaddisksectors.Ifitencountersabadsector,itwillperformaDynamicSector

    Repair(DSR)anduseparityinformationelsewhereinthesystemtorebuildthe

    necessaryinformationandrewriteanewblocksomewhereelseonthedrive.

    MediaScanperiodicallyreviewsdatablocksanddisksectorsthatmaynothavebeenaccessed,fromafilelevel,inmonthsoryearsandtherebyhelpstokeep

    thedrivesashealthyaspossible.

    o AsoftheOneFS5.0release,allfilesystemmetadatacanbecheckedbytheIntegrityScanrestriperphase.ThisprocesswillallowHathiTrusttocompletelycheckfile

    dataandmetadataviaassociatedchecksums.

    OtherinstancesofinherentredundancyincludenonvolatileRAM,afullyjournaledfilesystem,and

    softwareapplicationsthatmanageclientconnectionsintheeventofanodesfailure.

    o OneFSisafullyjournaledfilesystemwithlargeamountsofbatterybackednonvolatilerandomaccessmemory(NVRAM)withineachnode,whichensurestheintegrity

    ofthefilesystemintheeventofunexpectedfailuresduringanywriteoperation.38

    o TheIsilonSmartConnectmodule[ensures]thatwhenanodefailureoccurs,allinflightreadsandwritesarehandedofftoanothernodeintheclustertofinishits

    operationwithoutanyuserorapplicationinterruption.[]Ifanodeisbroughtdown

    foranyreason,includingafailure,thevirtualIPaddressesontheclientswillseamlessly

    failoveracrossallothernodesinthecluster.Whentheofflinenodeisbroughtback

    online,SmartConnectautomaticallyfailsbackandrebalancestheNFSclientsacrossthe

    entireclustertoensuremaximumstorageandperformanceutilization.39

    HardwareSupportandService HathiTrustequipmentiscoveredbysupportandserviceagreementswithitsvariousvendors

    (SunMicrosystems,Dell,CDWG,etc.).Agoodexampleofonesuchagreementisfoundinthe

    PlatinumsupportprovidedbyIsilonSystemsandwhichincludes:

    o Extended24x7x365Telephone&OnlineHardwareandSoftwareSupporto 24x7ProactiveMonitoring&AlertsEmailHome(forHardwareandSoftware)o ReturnPartstoFactoryforRepairand4hourReplacementPartsDeliveryo SupportIQ(EnhancedServiceabilityDiagnostics)andSystemEventTrackingo OnsiteTroubleshootingo IsilonHardwareInstallationo SoftwareProductDocumentation,ReleaseNotes,andaccesstoProductTechnicalNoteso RemoteDiagnosis(ProvidedUserGrantsAccess)o Maintenance&PatchReleases

    38IsilonSystems.UncompromisingReliabilitythroughClusteredStorage:DeliveringHighlyAvailableClustered

    StorageSystems(2008)p.939IsilonSystems.DataProtectionforIsilonScaleOutNAS(2009)p.6

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    19/61

    20090824 13

    o MinorandMajorUpgradeReleases(IncludesPerformanceImprovements,NewFeatures,ServiceabilityImprovements).40

    EquipmentTrackingLITCoreServices(CS)maintainsaninventoryofserversonawikipageaccessibletoitsstaff.

    Detailsincludeeachserversname,location,onlineandretiredates,upgrades,notesonstorage,andits

    primaryservice.Additionalinformationisprovidedrelatedtospecifications,supportcontracts,andkey

    contactinformation.TheCSserverinventoryiscurrentlyoutofdate.

    HardwareReplacementScheduleo HathiTrustreplacesstorageregularly,approximatelyevery34yearsorastheusable

    lifeofstorageequipmentdictates(HTTRACC1.7)

    o HathiTruststaffupgradehardwareonaregularbasis(i.e.,everythreeorfouryears),andtohelpdetectmorerapidgrowthindemands,thewebserverandstorage

    infrastructureshavetheirownperformancemonitoringthatindicateoverload

    conditions.(HTTRACC1.10)

    TimelineforEmergencyReplacementofHathiTrustInfrastructureShouldaseriouseventrequirethereplacementofpart(orall)oftheHathiTrusttechnical

    infrastructure,thefollowingtimelineprovidesageneralestimateofthetimerequiredtoorder,ship,

    andinstallnewequipment.AcursoryreviewofthetimenecessaryforHathiTrusttorecoverfroma

    majordisasteratthemainAnnArbororIndianapolisdatacentersuggeststhatalargeeventcouldidle

    aninstanceoftherepositoryforatleastamonthandahalf.Inadditiontotheserversandswitches

    mentionedabove,criticalcomponentsincludefour30Apowerdistributionunits(PDUs)perrackand

    fourracksperdatacenterasofthiswriting.

    o SubmissionofPurchaseOrders: Forordersunder$5,000,theMPathwaysapplicationallowstheUniversity

    Librarysbusinessmanagertosendpurchaseordersdirectlytovendors.

    Forordersover$5,000,ProcurementServicesnormallytakesonetotwobusinessdaystoapprovethepurchase,buttheprocessmaytakeuptoaweekif

    questionsariseoradditionalpurchaseinformationisneeded.

    o DeliveryofEquipment: Productsthevendorhasinstockandavailableforimmediateshipmenttake13

    daystobedelivered.

    Itemsthatneedtobeconfigured(suchasservers)usuallytake12weeks. Isilonstoragewilltake3weekstobedeliveredinaworstcasescenario.

    o Installation: 3daysFTEforIsilonIQclusterinadditiontothetimerequiredforotherservers,

    switches,PDUsandrackunits.

    40IsilonSystems.SupportAdvantageOfferings(2009)retrievedfrom

    http://www.isilon.com/support/?page=planson30June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    20/61

    20090824 14

    o DataRestoration:about.5TB/hour(15days,asofJune2009)41 WhileHThasabout110TBofdatainitsstorage,thebackuptapesmaintained

    bytheTSMGroupcontainroughly176TBofinformationduetothedata

    encryptionusedtoprotecttheintellectualrightsofthematerial(asof06/2009).

    Thelengthoftimerequiredforabaremetalrestorationwillbeinfluencedbytapemounts,networkspeed,restoringtotheNFSshares,decryption,etcetera.

    Ifthelibrary/HTweretopurchaseanadditionaltapedrive(atroughly$20,000),theprocesscouldbespedup,perhapstoabout1TB/hour.

    Intheeventofalargescaledisasterinwhichmultiplecampusunitsrequireextensivedatarestoration,theTSMBackupServiceSLAstatesthatITCS

    managementwillworkwithcustomerstodeterminehowtoprioritizecustomer

    restores.(sec.4.11)ThisdeterminationwillreflecttheUniversityofMichigans

    organizationalpriorities42:

    Priority1:Healthandsafetyoffaculty,staff,students,hospitalpatients,contractors,renters,andanyotherpeopleonUniversitypremises.

    Priority2:Deliveryofhealthcareandhospitalpatientservices Priority3:Continuationandmaintenanceofresearchspecimens,

    animals,biomedicalspecimens,researcharchives.

    Priority4:Deliveryofteaching/learningprocessesandservices Priority5:SecurityandpreservationofUniversityfacilities/equipment. Priority6:Maintenanceofcommunity/Universitypartnerships.

    o Fractionalrestoreswould,forthemostpart,runatcomparablespeedsunlesstherewasaneedtorestorealargenumberofrandomfiles,inwhichcasetherewouldbea

    decreaseinspeedduetotapeseekandmounttimes.

    o DelaysinrecoverycouldbeincreaseddramaticallyiftheMACCdatacenteroritsinfrastructurehassustaineddamageandneedsrepair.

    HathiTrustandInsuranceCoverageattheUniversityofMichiganTheOfficeofFinancialOperationsreviewsandaddsfinancialassetsgreaterthan$5,000tothe

    assetmanagementsystemoftheUniversityofMichigan.ThePropertyControlOfficeisthenresponsible

    fortaggingfinancialassetswithuniqueUniversityofMichiganidentifiersandtrackingthem.Risk

    ManagementServicesadministerstheUniversityspropertyinsuranceandwillprovidethe

    reimbursementofreplacementcostsforitemsselfinsuredbyMichigan.AsofJuly2009,thenatureand

    extentoftheUniversityofMichigansinsurancecoverageforHathiTrusthardwareremainedunder

    review.ThemaincontactwithRiskManagementServicesinthismatterhasbeenCyndiMesa,Headof

    UMLibraryFinance.

    41Hanover,Cameron(ITCSTSMGroupStorageEngineer).Personalemailon23June2009.

    42UniversityofMichiganAdministrativeInformationServices.EmergencyManagement,BusinessContinuity,and

    DisasterRecoveryPlanning(2007)retrievedfromhttp://www.mais.umich.edu/projects/drbc_methodology.html

    on6July2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    21/61

    20090824 15

    Scenario2:NetworkConfigurationErrors

    Review:RisksInvolvingNetworkConfigurationErrorsThefollowingtablesummarizestherisksfacingHathiTrustastheresultofnetworkconfiguration

    errors.ConsiderationisgiventonetworkconnectionswithinUMdatacentersaswellasatUMsHatcher

    GraduateLibrary(siteofkeyadministrativeanddevelopmentactivities).Thearrangementofthese

    eventsreflectstherelativeseverityoftheirrespectiveconsequences.

    HathiTrustsSolutionsforNetworkConfigurationErrorsHathiTrustscontinuedaccesstotheInternetviatheUMnetBackboneisessentialforits

    continuedprovisionofservice.Therepositoryreceivesnetworkinfrastructuremaintenancethrough

    UMsITCS/ITCom;withitsrobustdisasterplanninginadditiontothelessonslearnedfromtheMidwest

    blackoutof2003,ITComguaranteescontinuednetworkaccessinallbutthemostcatastrophic

    scenarios.Intheeventofawidespreadpoweroutage,HathiTrustwouldbeabletomaintainaccessto

    theUMnetBackbonesincedatacentersareequippedwithredundantpowersuppliesandtheHatcher

    GraduateLibraryiscurrentlycategorizedasapriorityrecipientofpowerfromtheuniversity.ITCSalso

    has17generatorswhichcanbeusedtomaintainpowertonetworkswitchesintheeventofablackout.

    TheresponsibilitiesandobligationsofbothpartiesareoutlinedintheCustomerNetworkInfrastructure

    MaintenanceServiceAgreement.43

    ExtentofITComSupporto ITComagreestoprovidetheUnitNetworkInfrastructureMaintenancetoincludedata

    switches,routers,accesspoints,hubs,uninterruptiblepowersupplies(UPSs),firewalls,

    andotheridentifiedandagreeduponcomponents.(ITCSsec.1.0)

    43PleaserefertoAppendixG(ITCS/ITComCustomerNetworkInfrastructureMaintenanceServiceAgreement).

    Severity Event

    Highimpact Lossofservernetworkswitchoroutboundnetworkswitch LossofaccesstoUMnetBackbone

    ModerateImpact ExtendedlossofpoweratHatcherLibrarycouldleadtolossoflocalserversanddisruptionofadministrativeandoperationalactivities.

    LowImpact LossofpowerthatthreatensabilitytoconnecttoLocalAreaNetwork(LAN)/Backbone

    o Thelibraryremains(fornow)apriorityrecipientofelectricityfromtheUMpowerplant

    o CampusdatacentershaveUPSsandredundantbackuppower Failureoflocal/serversideconnections

    o Shouldproblemsarisewithconnectionstoindividualnodes,theclusteredarchitectureoftheIsilonsystemwillallowread/writerequeststobe

    handledbyalternatenodes.

    o IfconnectionsfailatoneHTsite,trafficcanbehandledbyremainingsite.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    22/61

    20090824 16

    ITComResponsibilitieso Provideandmaintainthenecessarymaterialsandelectroniccomponentstooperate

    theUnitNetworkInfrastructure.(sec.5.2)

    o ProvideconfigurationandNetworkInfrastructureAdministrationsupportnecessarytorepairandmaintaintheUnitNetworkInfrastructurehardwareandsoftwarecoveredby

    thisagreement.(sec.5.3)

    o Monitor24hours/dayand365days/year(24x365),supportedprotocolstothebackboneinterfaceoftheUnitsnetworkuptoandincludingtheextensiontothefirst

    huborswitch.(sec.5.6)

    o Monitor24hours/dayand365days/year(24x365),networkinterfacesonuninterruptiblepowersupplies(UPS)thatsupporttheUnitnetworkswitches.Provide

    notificationintheeventthataUPSisactivated,(inputpowerislostordegradedand

    systemswitchestobatterypower),deactivated,(inputpowerisrestored),or

    unreachable.ProvidenotificationtotheUnitNetworkAdministratorwhenbatteries

    degradetothepointofneedingreplacement.(sec.5.7)

    o ProvidemaintenanceonthestationcablingasinstalledbyITCom,oranapprovedUMvendorwhichmetITCominstallationspecifications.(sec.5.8)

    o ProvidePreventativeMaintenance(clean&vacuum)oneachCustomerUnitswitchcoveredinthisagreementyearly.(sec.5.9)

    ITComServicesinResponsetoOutagesorDegradationImpactingtheNetworko Aresponsewithin30minutesoftheITComNOCnotificationortheUnitscall,to

    provideinformationtotheUnitonspecificstepsthathavebeen/willbetakentoresolve

    theproblem.(sec.7.2.1)

    o Anonsitevisit,ifnecessary,withintwo(2)hoursoftheresponse(i.e.,themaximumonsiteresponsetimewillbetwoandahalf(21/2)hours).Anupdatewillbeprovided

    totheUnitNetworkAdministratorifonsiteandabestguessETRwillbeprovidedbased

    onavailablefacts.ITComwillcontinuetoprovidetheUnitwithupdateseverytwohours

    duringanoutage.(sec.7.2.1)

    o IfanoutageisidentifiedwithintheagreementservicehoursITComwillresolvetheoutageeveniftherepairtimeextendsbeyondtheserviceagreementhours.(sec.

    7.2.1)(Repairsoutsideoftheagreementhoursresultinadditionallaborexpenses.)

    o ConductmonitoringviaSNMPPOLLINGatoneminuteintervals.(sec.7.2.1)

    HathiTrustResponsibilitiesITComsresponsibilitiesendatthefirstnetworkswitchandfromtheretoitsservers,HathiTrust

    isresponsibleformaintainingnetworkconnectivityandsecurity.TherepositoryusesInternet2for

    communicationandsynchronizationbetweentheAnnArborandIndianapolissites.EachIsilonnodehas

    dual10GBInfinibandportsforinternal(i.e.,intracluster)communicationanddual1GBEthernetfor

    externalcommunication.

    Scenario3:NetworkSecurityandExternalAttacks

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    23/61

    20090824 17

    Review:RisksInvolvingNetworkSecurityandExternalAttacksThefollowingtablegivesageneraloverviewofthebasicthreatanexternalattackornetwork

    securitybreachposestoHathiTrust;entriesarearrangedbyseverity.Thelist,however,isnotexhaustive

    andnoattempthasbeenmadetopublicizepotentialvulnerabilities.

    HathiTrustsSolutionsforNetworkSecurity MaliciousactivityagainstHathiTrustcouldinvolveunauthorizedaccesstoasystemordata,

    denialofservice,orunauthorizedchangestothesystem,software,ordata.Asanacademicentity,the

    repositoryisseenaslessofatargetforsuchactionsthancommercialorgovernmentaltargets;despite

    thisperceivedlowerrisk,HathiTrusthasnotbeenlulledintoafalsesenseofsecurity.Therepository

    takesseriouslythepotentialforviolationsofitsnetworkandoperatingsystemsecurityandtherefore

    hasinstitutedaprogramofperiodicsoftwareupdatesinadditiontothemaintenanceofanITCom

    supportedfirewall,authenticationrequiredaccess,andothermeasures(suchasthrottlingsoftwareto

    deterdenialofserviceattacks).Becausecontentiscurrentlyacceptedfromtrustedsources(namely,

    GoogleandlegacydigitalcollectionsfromHathiTrustpartners)theGROOVEprocessdoesnotincludea

    virusdetectionphase.Asdigitalobjectsareingestedfromagreaternumberofsources,additional

    securitymeasuresshouldbeconsidered.

    o HathiTruststaffapplysecurityupdatestotheoperatingsystemandtonetworkingdevicesassoonastheybecomeavailableinordertominimizesystemvulnerability.As

    withnewsoftwarereleases,securityupdatesaretestedinadevelopmentenvironment

    beforebeingreleasedtoproduction.Softwarepackagesthatpresentalowersecurity

    riskandthathaveagreaterpotentialtoaffectapplicationbehavior(webservers,

    languageinterpreters,etc.)aregenerallyinstalled,configuredandtestedmanuallyto

    allowforgreatercontrolinmanagingupdates.Softwareupdatesarenotapplied

    automatically;moreover,updatesthatpresentapotentialforhavinganimpacton

    systembehaviorareappliedandtestedfirstinthedevelopmentenvironment.Ifno

    impactsareseen,HathiTruststaffapplytheseupdatesinproductionafteratesting

    periodofatleastoneweek.(HTTRACC1.10)

    Severity Events

    Highimpact UnauthorizedaccesstoHathiTrustcontentleadstotheinfringementofcopyrights. Lossofdataorfunctionalityforanextendedperiodoftimeasaresultofmalicious

    activity.

    ModerateImpact HathiTrustservicesaretemporarilyunavailableasaresultofmaliciousactivity.LowImpact ThedeliveryofHathiTrustservicesslowsastheresultofmaliciousactivity.

    Asecurityweaknessexistswithinthesystembutremainsunexploited.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    24/61

    20090824 18

    Scenario4:FormatObsolescence

    Review:RisksInvolvingFormatObsolescenceThefollowingtableoutlinesthethreatsposedbyformatobsolescenceandarrangesthem

    accordingtotheirpotentialseverity.

    HathiTrustsSolutionsforFormatObsolescenceAnawarenessandacknowledgementofthedangersofformatobsolescencehasledHathiTrust

    toimplementproactivepoliciesandprocedurestoensurelongtermaccesstotherepositoryscontent.

    Therepositoryonlyacceptsspecificformatsthatmeetrigorousspecificationsand,throughtheprior

    experienceofUniversityofMichiganpersonnel,hasdevelopedprotocolsforthesuccessfulmigrationof

    contentfromoneformattoanother.Inaddressingthethreatofformatobsolescence,thepreservation

    oftheintegrityandauthenticityofdepositedcontenthasbeenanoverarchingconcern.

    SelectionofFileFormatso HathiTrustiscommittedtopreservingtheintellectualcontentandinmanycasesthe

    exactappearanceandlayoutofmaterialsdigitizedfordeposit.HathiTruststoresandpreservesmetadatadetailingthesequenceoffilesforthedigitalobject.HathiTrusthas

    extensivespecificationsonfileformats,preservationmetadata,andqualitycontrol

    methods,includedintheUniversityofMichigandigitizationspecifications,datedMay1,

    2007.44(HTTRACB1.1)

    o HathiTrustcurrentlyingestsonlydocumentedacceptablepreservationformats,includingTIFFITUG4filesstoredat600dpi,JPEGorJPEG2000filesstoredatseveral

    resolutionsrangingfrom200dpito400dpi,andXMLfileswithanaccompanyingDTD

    (typicallyMETS).HathiTrustsupportstheseformatsbecauseoftheirbroadacceptance

    aspreservationformatsandbecausetheformatsaredocumented,openandstandards

    based,givingHathiTrustaneffectivemeanstomigrateitscontentstosuccessivepreservationformatsovertime,asnecessary.TheRepositoryAdministratorshave

    undertakensuchtransformationsinthepast;moreover,HathiTrustoffersenduser

    servicesthatroutinelytransformdigitalobjectsstoredinHathiTrusttopresentation

    formatsusingmanyofthewidelyavailablesoftwaretoolsassociatedwithHathiTrusts

    44Specificationsareavailableat

    http://www.lib.umich.edu/lit/dlps/dcs/UMichDigitizationSpecifications20070501.pdf

    Severity Events

    Highimpact Applicationsandhardwarearenolongerabletoreadordisplaydigitalobjects. Errorsintranslatingandreadingfilesarenotunderstoodoracknowledgedby

    repositoryusers.

    ModerateImpact ProblemswiththetranslationoffileformatsresultinDIPsthatdonotfaithfullyreflecttheoriginaldigitalobjects.

    LowImpact Formatsandassociatedapplicationschangebutretaincompatibilitywitholderversionsofthefileformats.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    25/61

    20090824 19

    preservationformats.HathiTrustgivesattentiontodataintegrity(e.g.,through

    checksumvalidation)aspartofformatchoiceandmigration.45

    o Eachformatconformstoawelldocumentedandregisteredstandard(e.g.,ITUTIFFandJPEG2000)and,wherepossible,isalsononproprietary(e.g.,XML).(HTTRACB4.2)

    FormatMigrationPoliciesandActivitieso HathiTrustiscommittedtomigratingtheformatsofmaterialscreatedaccordingto[its]

    specificationsastechnology,standards,andbestpracticesinthedigitallibrary

    communitychange.(HTTRACB1.1)

    o HathiTruststaffmembersconductmigrationsfromonestoragemediumtoanotherusingtoolsthatvalidatechecksumsinternally.(Digitalobjectsarestoredbothonline

    andontape,andtheonlinestoragesystemconductsregularscanstodetectandcorrect

    dataintegrityproblems.)Atotalfilecountisdonefollowingalargedatatransfer,and

    regularlyscheduledintegritychecksfollow.(HTTRACC1.7)

    o [HathiTrust]hasmigratedlargeSGMLencodedcollectionstoXML,andLatin1characterencodingstoUTF8Unicode.Oursuccessinmigratingfromolderformatsto

    newerformatsdemonstratesourcommitmenttoourcollectionsandourabilitytokeep

    materialsinourrepositoryviable.Allmigrationsaredocumentedinchangelogs.(HT

    TRACB4.2)

    45HathiTrust.Preservation(2009)retrievedfromhttp://www.hathitrust.org/preservationon16June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    26/61

    20090824 20

    Scenario5:CoreUtilityand/orBuildingFailure

    Review:RisksInvolvingCoreUtilityorBuildingFailureThefollowingtablesummarizesthedangersautilityorbuildingfailureposestoHathiTrustand

    rankseventsbytheirpotentialseverity.

    HathiTrustsSolutionsforUtilityorBuildingFailureThecontinueddeliveryofHathiTrustsservicesdependsuponthemaintenanceofpower,

    environmentalcontrol,andsecurityinitsserverenvironmentattheMichiganAcademicComputing

    Center(MACC)andotherlocationsthathostcomponentsoftherepository.Inthisrespect,HathiTrustis

    heavilyreliantupontheinfrastructureoftheMACCaswellasthatoftheArborLakesDataFacility,home

    tooneinstanceoftheTSMGroupsbackuptapelibrary.Bothlocationsprovidecloselymonitoredand

    highlyredundantenvironmentsthathelpensurethatHathiTrustsinfrastructureremainssecureand

    operable.Atthesametime,administrativeanddatamanagementfunctionscriticaltothedevelopment

    andmaintenanceoftherepositorytakeplaceintheUniversityofMichigansHatcherGraduateLibrary.

    TheserviceandcooperationofMichigansPlantOperationsDivisionarethereforecriticalforthe

    continuedaccesstoanduseofthisstructureintheoperationofHathiTrust.

    GeneralMaintenanceandRepairsinUniversityofMichiganFacilitiesFacilitiesandmaintenanceissuesontheUniversityofMichigancampusarereportedtothe

    PlantOperationsDivision,theDepartmentofPublicSafety(DPS),andOccupationalSafetyand

    EnvironmentalHealth(OSEH)inadditiontotheimpactedfacilitysmanager.Repairworkiscoordinated

    bytheUniversityLibraryfacilitiesmanagerinconjunctionwithadministratorsandworkersfromPlant

    Operations.

    TheMichiganAcademicComputingCenter(MACC) TheMACChostsmanyofthekeycomponentsoftheMichigansUniversityLibrarysystemandas

    wellasthetechnicalinfrastructureofHathiTrust.TheUniversityofMichigandoesnotownthebuilding

    inwhichthedatacenterislocatedbutinsteadoperatestheMACCinconjunctionwiththeMichigan

    InformationTechnologyCenter(MITC)Foundationandotherpartners.TheMACCServerHostingService

    Severity Events

    ExtensivestructuraldamagerenderstheMACC(orkeyelementsofitsinfrastructure)unusableandnecessitatestheestablishmentofahotsitetorecover

    andcontinueoperations.

    Additionalfailurepasttoleranceinbackupcoolingorpowerinfrastructure

    Highimpact

    ModerateImpact Failureofbackuppowerpastredundancytolerance(failureof2generators)o Datacentercoordinatormayinitiateloadshedandshutdownhalfofthe

    MACC(butlibraryrackswillremainoperational)

    Structuraldamagerendersfacilitytemporarilyunsafeand/orunusable.LowImpact Lossofpower

    Lossofenvironmentalcontrolunitswithinredundancy

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    27/61

    20090824 21

    LevelAgreement46liststheresponsibilitiesofthedatacenteraswellastherepository;ofparticular

    significancearetheMACCsagreementsto:

    o Provideacontrolledphysicalenvironmenttosupportservers[with]roomaveragetemperatureofbetween65and75degreesand3550%relativehumidity[and]

    monitoredenvironmentals(temperature,humidity,smoke,water,electrical.(sec.4.1)

    o Provideadequate,conditioned,60cycleelectricalservicewithadequatebackupelectricalcapacitytosupportcircuits,service,andoutlets[andalsoto]provide

    UninterruptiblePowerSupply(UPS)andgeneratorbackup(sec.4.2)

    o Provide7x24telephonecontactforemergenciesandforemergencyaccesstofacility.(sec.4.4)

    Inadditiontofeaturessuchasredundantelectricalandenvironmentalsystems,theMACC

    maintainsafulltimecoordinatorandstaffwhoprovide24x7responsestofailuresormalfunctionsinthe

    serverenvironment.Alertspromptedbyissueswiththeenvironmentalsystemsorpoweraresenttothe

    UniversityofMichiganNetworkOperationsCenter(NOC)duringnonbusinesshours.

    o Overview: TheMACC'sredundancyisdesignedtoensurethesafetyandsecurityofthe

    datahousedwithin.Itconsistsof:

    Adualpowerpathfromthepropertylinetothepowerdistributionunits

    Dieselpoweredgeneratorsforelectricalbackup Flywheels(notbatteries)toprovidepowerwhilethegeneratorscome

    on

    Stateoftheartgeneratorsandflywheelsforbackuppower Threeextracomputerroomairconditioners Twoextradrycoolers Glycolloopforcoolingwithtwoparallelpathwayswithcrossovervalves

    atregularintervals.47

    Astateoftheartmonitoringsystemkeepstrackof1,700differentparametersandautomaticallynotifiesstaffofanyirregularity.48

    o EnvironmentalControlsandMonitoring TheMACChas18ComputerRoomAirConditioningunits(CRACs).Atanygiven

    time,only15arenecessarytomaintaintherequiredtemperatureandhumidity.

    [Thus,thecomputerroomhasN5+1redundancyinitscoolingability.]Italsois

    equippedwithanumberofportablecoolerstoaddressspecificcoolingneeds.

    Theheatfromtheroomistransferredtoanunderfloorglycolloopthat

    releasestheheattotheoutdoors.49

    46PleaserefertoAppendixH(MACCServerHostingServiceLevelAgreement).

    47MichiganAcademicComputingCenter.VitalStatistics(2009)retrievedfrom

    http://macc.umich.edu/about/vitalstatistics.phpon16June2009.48.MichiganAcademicComputingCenter(2009)retrievedfromhttp://macc.umich.edu/index.phpon16June

    2009.49.VitalStatistics(2009)retrievedfromhttp://macc.umich.edu/about/vitalstatistics.phpon16June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    28/61

    20090824 22

    Thelayoutofthefacilityallowsthefrontonthecomputerrackstobefacingthecoldaisles.Theseaisleshaveperforatedfloortilesthroughwhichthecool

    airispumpeddirectlytothecomputerslocatedthere.Heatisdischargedfrom

    thebacksofthecomputers,whichcreatesthehotaisles.Thisalternating

    arrangementfacilitatesthecoolingprocess,asthehotairproducedbythe

    computerscanbesiphonedoffbeforeitminglestoomuchwiththecoolerairof

    thefacility.50

    TwoseparatesmokedetectionandfirealarmsystemsprotecttheMACC.Oneisforthebuilding;theotherisfortheMACCitself.Thetwosystemswork

    togethertoactivatealarmsystemsandnotifythefiredepartmentandkey

    personnel.Intheeventofanactualfire,thefiresuppressionsystempipeswill

    notfillwithwaterunlessthereisapressuredropcausedbymeltingofoneor

    moreofthesprinklerheads.51

    o BackupPower Threegenerators,eachroughlythesizeofarailcar,providebackuppower.

    Onlytwoofthethreearerequiredtorunthefacilityintheeventofapower

    outage.52

    TheMACCusesenvironmentallyresponsibleflywheelsinsteadofbatteriesforpowerbackupwhilethegeneratorscomeonline.Thecombinationofgenerators

    andflywheelsprovidesthefacilitywithafullyredundantuninterruptiblepower

    system(UPS).53

    TheMACChasacontractwiththeUMPlantOperationsDivisionforthedeliveryofdieselfuelforitsgeneratorsintheeventofanextendedblackout.54

    Intheeventthatabackupgeneratorisdisabled,theMACCcoordinatorwillinitiateloadshed,inwhichonehalfoftheMACCwillbeshutdownsothatthe

    otherhalf(andrequisiteenvironmentalsystems)maycontinuetooperate.The

    HathiTrustandUMLibraryracksareamongthosewhichwillretainpower

    shouldthisresponseprovenecessary.55

    ArborLakesDataFacility(ALDF)TheALDFhousestheTSMGroupsinfrastructureandoneinstanceofthebackuptapelibrary

    thatformsanintegralpartofHathiTrustsDisasterRecoverystrategy.Asthehomeofcritical

    componentsoftheUMnetBackbone,theALDFprovidesasafeandsecurelocationforonesetofthe

    repositorysbackuptapes.Intheinterestofsecurity,thisreportwillomitfurtherinformationonthe

    exactnatureofthefacilityspowerandenvironmentalsystems.

    50Ibid.

    51Ibid.

    52.MichiganAcademicComputingCenter(2009)retrievedfromhttp://macc.umich.edu/index.phpon16June

    2009.53Ibid.

    54Gobeyn,Rene(MACCDataCenterCoordinator).Personalinterviewon23June2009.

    55Ibid.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    29/61

    20090824 23

    Scenario6:SoftwareFailureorObsolescence

    Review:RisksInvolvingSoftwareFailureorObsolescenceThefollowingtabledetailsvariousrisksinherenttosoftwarefailureorobsolescenceandranks

    themaccordingtotheirseverity.

    HathiTrustsSolutionsforSoftwareIssues ThedevelopmentanduseofHathiTruststoolsandresourcesdependsonhighlyfunctional

    softwareapplications.Repositorypolicieshavethereforebeencraftedtoensurethattheseapplications

    arethoroughlytestedandregularlyupdatedtominimizethethreatofserviceoutagesasaresultof

    softwarefailureorobsolescence.HathiTrustfurthermoreemploysopensourceapplicationsthatare

    wellsupportedandenjoywidespreaduseanddevelopmentwithinthedigitallibrarycommunity.

    o Changesinsoftwarereleasesofallcomponentsofthesystem(fromingesttoaccess)aredevelopedandtestedinanisolateddevelopmentenvironmenttopreparefor

    releasetoproduction.Whenreadyforrelease,developersrecordthechangesmade

    andincrementversionnumbersofsystemcomponentsasappropriateusingaversion

    controlsystem.Newversionsofsoftwarearereleasedusingautomatedmechanisms(in

    ordertopreventmanualerrors).Majorchangesandupgradesinhardwarearchitecture

    arerecordedinmonthlyreportsofunitactivity,andthusaretraceabletothatlevelof

    detail.(HTTRACC1.8).

    o Additionally,subsetsofproductiondataareavailableinthedevelopmentenvironmenttoallowdeveloperstoensurepropersystembehaviorbeforereleasingchangesto

    production.(HTTRACC1.9)

    o Inordertodesign,buildandmodifysoftwareforthedesignatedendusercommunity,HathiTrustconductsanactiveusabilityprogramandseeksinputfromtheStrategic

    AdvisoryBoardofHathiTrust.Similarly,withregardtosoftwaredevelopmentinsupport

    ofthearchivingneedsoftheParticipatingLibraries,HathiTrustfocusesonthe

    developmentofhighlyfunctionalingestandvalidationmechanisms.HathiTrustalso

    seeksandrespondstoguidancefromtheStrategicAdvisoryBoardwithregardto

    archivingservices.(HTTRACC2.2)

    Severity Events

    Highimpact Softwarebugescapesdetectionindevelopmentenvironmentandresultsincrashofapplication.

    ModerateImpact Softwarebugescapesdetectionindevelopmentenvironmentandpreventsfullaccesstodigitalobjects.

    Improperversionofsoftwareisintroducedtosystem(couldhaveagreaterorlesserimpactdependingonresultsoferrorandrepositorysabilitytodetectit).

    LowImpact

    Softwarebugescapesdetectionindevelopmentenvironmentandpreventsfulluseofsystemcapabilities(i.e.,rotationofimagesoradditionalfunctionality)

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    30/61

    20090824 24

    Scenario7:OperatorError

    Review:RisksInvolvingOperatorErrorThefollowingtablesummarizesriskstoHathiTrustposedbyoperatorerror;eventsareranked

    accordingtotheirpotentialseverity.

    HathiTrustsSolutionsforOperatorErrorInanyhumanenterprise,occasionaloperatorerrorisunavoidable;HathiTruststrivestoensure

    thatanysucheventsaredetectedandresolvedinatimelyfashion.56Tohelpavoidoccurrencesand

    mitigatetheirpotentialimpact,HathiTrusthasautomatedmanyproceduresandalsoreliesupon

    applicationassertions,whichcannotifyadministratorswhenprocessesarenotoperatingcorrectly.Even

    ifanerrorisintroducedtothefilesystemandthenbackedup,theTSMclientsavesuptosevenversions

    ofafileforuptosixmonthssothatanearlierversioncanberetrieved.

    Ingest:TheGoogleReturn(ObjectOriented)ValidationEnvironment(GROOVE)processisentirelyautomatedtoavoidtheintroductionofoperatorerrortotheprocess;stepsinclude:

    o Identificationofmaterialforingesto

    DecryptionandunzippingoffilesFormatverificationandvalidationwithJHOVEo LunBarcodeandMD5checksumvalidationo CreationofHathiTrustMETSdocumentso EstablishmentofHathiTrusthandles(persistentURLs)o Extensionofthepairtreefiledirectory(asnewmaterialentersthesystem)

    ArchivalStorage:FilesstoredwithintherepositoryarenotaccesseddirectlyormanipulatedbystaffsothatneitherthezippedimageandOCRfilesnortheMETSdocumentmaybeaccidently

    alteredordeleted.

    Dissemination:Thepageturnerapplicationreferencesthestoredimageandthencreatesa.png(forTIFFs)or.jpg(forJPEG2000s)filefordisplaytotheviewer.

    DataManagement:Newversionsofsoftwarearereleasedusingautomatedmechanisms(inordertopreventmanualerrors).(HTTRACC1.8)

    56PleaserefertoAppendixB(HathiTrustOutagesfromMarch2008throughApril2009).

    Severity Events

    Highimpact Operatorerrorresultsintheirreparablelossofdataordamagetoequipment. Operatorerrorresultsinlossofkeyrepositoryfunctions(ingest,storage,

    dissemination,etc.)foranextendedperiodoftime.

    ModerateImpact Operatorerrorremainsundetectedandcausespersistentproblemsinthesystembuthasnolongtermconsequences.

    LowImpact Operatorerrorisdetectedbynormalproceduresorviaanactivitylogandcanbereadilycorrected.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    31/61

    20090824 25

    Scenario8:PhysicalSecurityBreach

    Review:RisksInvolvingaPhysicalSecurityBreach MaintainingthephysicalsecurityoftheHathiTrustinfrastructureisyetanothercrucialelement

    intherepositoryseffortstomanagerisksandtherebylessenthechancethatadisastertypeevent

    occurs.Risksinvolvethedamageanddestructionofequipmentandcouldevenextendtounauthorized

    systemaccess.MultiplelevelsofsecurityexistatboththeMichiganAcademicComputingCenter

    (MACC)andtheArborLakesDataFacility(ALDF)toprotectHathiTrustfromtheactsofvandalism,

    destructionormalicioustampering.Detailsonthepotentialimpactsofaphysicalsecuritybreachare

    coveredinScenario1:HardwareFailureandScenario3:NetworkSecurity.

    HathiTrustsSolutionsforPhysicalSecurityo Eachof[theHathiTrust]storageortapeinstancesisphysicallysecure(e.g.,inalocked

    cageinamachineroom)andonlyaccessibletospecifiedpersonnel.57

    SecurityattheMACCTheMACCServerHostingSLAstatesthedatacenterstaffwill:

    o Provideservicesnecessarytomaintainasafe,secure,andorderlyenvironmentforalltenantsoftheMACC.(sec.4.7)

    o ProvideaccesscontrolviaHiDcardandbiometricreadersforthoselistedontheTenantStaffAuthorizedforAccesslist.(sec.4.5)

    TheMACCWebsiteandtheMichiganAcademicComputingCenterOperatingAgreement58provide

    additionaldetailsconcerningtheresourcesandproceduresthathelpprotectHathiTrustsequipmentat

    theMACC.TheMACCDataCenterCoordinatorpersonallyoverseestheenforcementofsecurity

    protocolsandconductsregularauditsofsecuritylogsand,whennecessary,reviewssurveillancevideo

    footage.

    o SecuritySystems Stateoftheartsecuritydevicessuchasirisscanners,cameras,closedcircuit

    televisionandoncallstaffkeepthedataandmachineshousedintheMACC

    safe.59

    Accesstothedatacenterwillbebytwofactorauthentication(accesscardandirisscan)orescorted,supervisedaccess.Accesstothebuildingwillbebyaccess

    card.(MACCOA,sec.5.3.1)

    Camerasthroughoutthecorridor,securitytrap,andfacilitywillbemonitoredandmaintainedbytheDataCenterCoordinator.(sec.5.2.1)

    o SecurityProcedures57HathiTrust.Technology(2009)retrievedfromhttp://www.hathitrust.org/technologyon15June2009.

    58PleaserefertoAppendixI(MichiganAcademicComputingCenterOperatingAgreement).

    59MichiganAcademicComputingCenter.VitalStatistics(2009)retrievedfrom

    http://macc.umich.edu/about/vitalstatistics.phpon17June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    32/61

    20090824 26

    TheOperationsAdvisoryCommitteewillestablishproceduresforgrantingaccesscardstothefacilitytothosewhosejobsrequirehandsonaccessto

    systems.Allrequestsforaccesscardswillbevettedandapprovedbythe

    OperationsAdvisoryCommitteeattheirnextmeeting.(sec.5.3.2)

    Everyoneontheaccesslistforthedatacenterwillberequiredtoattendatrainingsessionbeforeworkinginthedatacenterandsignanaccessagreement

    statingpoliciestheymustobservewhileinthedatacenter.(sec.5.3.8)

    SecurityattheALDFAsnotedintheTSMBackupServiceSLA,theUniversityofMichigansITCSisresponsiblefor

    physicalsecurityattheALDF.(sec.4.9)Whilethisdocumentwillnotdetailspecificfeaturesofthe

    ALDFsoperation,multiplelevelsofsecurityandoversightareemployed.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    33/61

    20090824 27

    Scenario9:NaturalorManmadeDisaster

    Review:RisksInvolvingaNaturalorManmadeDisasterThefollowingtabledetailstheriskstoHathiTrustposedbyanaturalormanmadedisaster;

    eventsarerankedbyorderoftheirseverity.DuetopossibleoverlapbetweenthisscenarioandScenario

    1(HardwareFailure),readersareencouragedtoconsultthatearliersection.

    HathiTrustsSolutionsforNaturalorManmadeCatastrophicEventsTheUniversityofMichiganAnnArborCampusEmergencyProcedures(revisedJanuary2008)

    hassetprocedurestoaddressbuildingevacuations(intheeventoffire),tornadoes,severeweather,

    flooding,chemical/biological/radioactivespills,aswellasbombthreats,civildisturbances,andactsof

    violenceorterrorism.60Inallcases,staffwillfollowthedirectionsofPublicSafetyandnotreenter

    buildingsorresumeworkuntiladvisedtodosobyDPSorOSEHorsomeonefromonsiteincident

    command.

    Intheeventofaseverenaturalormanmadedisaster,therepairandrestorationofthephysical

    locationsofHathiTrustinfrastructurewouldneedtobecoordinatedbetweentherepositoryandthe

    appropriatefacilitymanagers.Suchactivitywouldrelyuponthedisasterrecoveryplansinplaceatthe

    MITCBuilding(homeoftheMACC)andUniversityofMichigan(whichincludestheHatcherGraduate

    LibraryandtheALDF).Itmustbenotedthataneventwhichcausessignificantdamagetoanimportant

    structureortoabuildingsinfrastructurecouldresultinthelossofaninstanceoftherepositoryforan

    extendedperiodoftime.Insuchacase,HathiTrustwouldneedtosetupanalternatehotsiteuntil

    structuralrestorationiscomplete(oranewfacilityhasbeenfound).

    60PleaseseeAppendixC(WashtenawCountyHazardRankingList).

    Severity Events

    Highimpact Widespreaddamagetoadatacenterand/oritsinfrastructurethatforcesaninstanceoftherepositorytofindanewhotsitewithsufficientpowersupply,

    environmentalcontrols,andsecurity.

    Damagetoworkareasforcestafftorelocatetoanewcenterofoperations. Extensivelossordamagetohardwarerequireslargescalereplacement. Withtheextendedlossofonesite,HathiTrustlosesredundancy(andpossiblysome

    functionality:i.e.theabilitytoingestnewmaterialinAnnArbor)andthusacentral

    componentofitsdisasterrecoveryandbackupplans.

    AnactofviolenceorterrorismoccursatornearHathiTrustfacilities.ModerateImpact Aneventresultsinanextendedoutageatonesitethatexceedstherecoverytime

    objective.

    Hardwaresustainssomedamageandsiteisabletocontinueoperationinareducedcapacity.

    AnactualorthreatenedactofviolenceorterrorismforcesthetemporaryevacuationorquarantineofHathiTrustfacilities.

    LowImpact LocalconditionsresultinatemporaryoutageataHathiTrustsite.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    34/61

    20090824 28

    BasicDisasterRecoveryStrategiesIntheimmediateaftermathofalargescalemanmadeornaturaldisaster,therepositorys

    immediaterecoverywillbeenabledbyitsbasicsystemarchitecture:

    o theinitiativestechnologyconcentratesoncreatingaminimumoftwosynchronizedversionsofhighavailabilityclusteredstoragewithwidegeographicseparation(thefirsttwoinstancesofstoragearelocatedinAnnArbor,MIandIndianapolis,IN),aswellasan

    encryptedtapebackup(writtentoandstoredinaseparatefacilityoutsideofAnn

    Arbor).61

    TheestablishmentofthemirrorsiteinIndianapolisandtheretentionofmultiplebackuptapesattwo

    locationsinAnnArborensurethataseriouseventateitherlocationwillnotimpedethecontinued

    functioningoftherepositoryattheother.Considerationmustbegivenastohowdataatthe

    Indianapolissitewillbebackedupandhowkeyrepositoryfunctions(suchasingest)willproceedifthe

    AnnArborinstanceisofflineforanextendedperiodoftime.Likewise,alongtermoutageattheIU

    locationwouldrequireHathiTrusttoestablishathirdsitefordatabackup(i.e.,alocationwhere

    additionalcopiesofbackuptapescouldbestored).

    61HathiTrust.Technologyretrievedfromhttp://www.hathitrust.org/technologyon15June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    35/61

    20090824 29

    Scenario10:MediaFailureorObsolescence

    Review:RisksInvolvingMediaFailureorObsolescenceThefollowingtablesummarizesriskstoHathiTrustposedbythefailureofthemediausedforits

    databackups.Whiletherisksfromthisarelimited(bothcopiesofthetapebackupswouldhavetobe

    impactedfordatatobeunavailable),theissueshouldnonethelessbeaddressedwithregulartest

    restorationsand/orinspectionsofthemedia.

    HathiTrustsSolutionsforMediaFailure GiventhenatureofHathiTrustsstoragesystem,thisscenarioisonlyaconcerninregardstothe

    digitalmagnetictapesusedbytheTSMGroupforbackups.

    o TwotapecopiesofallbackupdataaremadeandthesearestoredinseparateclimatecontrolledconditionsintapelibrariesattheMACCandtheALDF.

    o Contentistransferredtonewtapeduringdatadefragmentation(whichoccurswhenexistingtapesare80%full),

    o Ifadegradedorotherwisebadsectionoftapeisdetectedduringabackupprocedurethattapeisimmediatelymarkedasreadonly.

    Dataisthenceforthwrittentoadifferenttape;existingdataonthebadtapewillbecopiedtoproperlyfunctioningmedia.

    Ifdatacannotbereclaimedfrombadtape,theTSMGroupwouldcontactHathiTrustsothatthebackupofcontentcanbeproperlycompleted.

    RemainingVulnerabilitiesThereissomereasonforconcerninthisareabecausetheTSMGroupdoesnothavearegular

    programtomonitoritsmediaforphysicaldegradationorimpairmentafterdatadefragmentation.While

    thetapesarereportedtobehighlydependable,problemssuchasstickyshed(thehydrolysisofthe

    tapesbinder)couldbecomeanissuewitholdertapes.Aregularprogramoftapevalidationortest

    restorationswouldprovideanopportunitytocheckonthephysicalconditionanddataintegrityofthe

    tapes.Likewise,thecreationofascheduleforthereplacementofoldertapescouldavoidfuture

    problemswithmediadegradation.

    Severity Events

    Highimpact Physicaldegradation(i.e.intapebinder,substrate,ormagneticcontent)affectsbothcopiesofolderbackuptapes.

    ModerateImpact Becausebackuptapesarenotregularlytestedoraudited,thephysicalsubstrateoftapesmaydegradeovertime.

    LowImpact Badtapeisdetectedduringatapebackup.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    36/61

    20090824 30

    ConclusionsandActionItems

    ConclusionsAsthisreportdemonstrates,avarietyofriskmanagementstrategiesinadditiontodesign

    elements,operatingprocedures,andserviceandsupportcontractsendowHathiTrustwiththeabilityto

    preserveitsdigitalcontentandcontinueessentialrepositoryfunctionsintheeventofarangeof

    disasters.TheestablishmentoftheIndianapolismirrorsite,theperformanceofnightlytapebackups,

    andtheredundantpowerandenvironmentalsystemsoftheMACCreflectprofessionalbestpractices

    andwillenableHathiTrusttoweatherawiderangeofforeseeableevents.Asitis,disastersoftenresult

    fromtheunknownandtheunexpected;whiletheaforementionedstrategiesarecrucialcomponentsof

    aDisasterRecoveryPlan,theymustbesupplementedwithadditionalpoliciesandprocedurestoensure

    that,comewhatmay,HathiTrustwillbeabletocarryonasbothanorganizationandadedicatedservice

    provider.

    IntheefforttosecureHathiTrustslongtermcontinuity,thepresentdocumentstandsmerelyas

    apreliminarystepintheestablishmentofalegitimateDisasterRecoveryPlan.ThedataonHathiTrusts

    policies,procedures,andcontractsconsolidatedhereinshouldfacilitatethedatacollectionrequisiteto

    theinitialphasesoftheplanningprocess,butthecoreactivitiesofformulatingtechnicaland

    administrativeresponsestrategiesanddelegatingrolesandresponsibilitiesremaintobeundertaken.

    Thefollowingsectionoutlinesrecommendationsandactionitemsderivedfromresearchintothe

    repositoryaswellasfromdiscussionswithCorySnavelyandotherHathiTruststaffmembers.Itemshave

    beenseparatedintoanapproximatetimelineofactivityrangingfromShortTermthroughLongTerm

    andthearrangementwithineachcategoryrepres