NOC Processes Roles - English

32
NETWORK OPERATING CENTER CHECK LIST AND PROCEDURES NETWORK OPERATING CENTER PROCESSES & ROLES RÉDACTEUR DU DOCUMENT Camille Bertrand KITE– [email protected] – tel 00 237 79 50 01 55 IDENTIFICATION DU DOCUMENT Fichier Word :NOC process & roles - English.doc ÉTAT DU DOCUMENT En cours d’élaboration En cours de correction En cours de modification En cours de validation Validé Périmé Les corrections désignent des évolutions mineures du document (nouvelle release). Les modifications désignent des évolutions importantes du document (nouvelle version). CONFIDENTIALITÉ DU DOCUMENT Document public Document privé (usage interne société) Document sensible (équipe d’exploitation) Document très sensible (destinataires uniquement) 12/03/2008 Page 1

Transcript of NOC Processes Roles - English

Page 1: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

NETWORK OPERATING CENTER

PROCESSES & ROLES

RÉDACTEUR DU DOCUMENT

Camille Bertrand KITE– [email protected] – tel 00 237 79 50 01 55

IDENTIFICATION DU DOCUMENT

Fichier Word :NOC process & roles - English.doc

ÉTAT DU DOCUMENT

En cours d’élaboration En cours de correction En cours de modification En cours de validation Validé Périmé

Les corrections désignent des évolutions mineures du document (nouvelle release).Les modifications désignent des évolutions importantes du document (nouvelle version).

CONFIDENTIALITÉ DU DOCUMENT

Document public Document privé (usage interne société) Document sensible (équipe d’exploitation) Document très sensible (destinataires uniquement)

12/03/2008 Page 1

Page 2: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

MAÎTRISE DU DOCUMENT

VÉRIFICATION DU DOCUMENT

Nom Date VisasVérificateur Bernard Fanga BFApprobateur

LISTE DE DIFFUSION

Destinataires Coordonnées

ÉVOLUTIONS DU DOCUMENT

Version Date Opération et commentaire1.0 11/10/2010 Mise-à-jour

12/03/2008 Page 2

Page 3: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

SOMMAIRE

1. ROLES 3

1.1 Monitor 3

1.2 Manage 3

1.3 Troubleshoot 3

2. FUNCTIONS 3

2.1 Performance monitoring 3

2.2 Status monitoring 3

2.3 Alert management 3

2.4 Policy monitoring 3

2.5 Quality insurance 3

2.6 Reporting 3

2.7 Schedule 3

2.8 Documentation 3

3. ESCALATION LEVEL BY CARRIER 3

3.1 MTN 3

3.2 Orange 3

3.3 CAMTEL 3

3.4 ITC GLOBAL 3

4. ESCALATION LEVEL - AES SONEL 3

5. ESCALATION PROCESS – NOC AES SONEL 3

6. INCIDENT MANAGEMENT 3

12/03/2008 Page 3

Page 4: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

1. ROLES

AES SONEL NOC is a team that has some important roles inside IT infrastructure system. In

general, that team has to:

- Monitor

- Manage

- Troubleshoot

1.1 MONITOR

NOC has to monitor the entire network system. It is concerning Routers, Switches, UPS,

Servers and so on, in such a way to manage the WAN, MAN and LAN in all AES-SONEL

facilities. In those equipments, there are some resources such as CPU, Memory, NIC,

Storage/Flash, Power supply, level of charge of battery, Temperature, interfaces in networks

equipments, Links etc. We have to check information and collect the events because they can

help to be proactive.

1.2 MANAGE

Some parameters have to be optimized in those equipments. Some formulas have to be filled

and approved by a committee to get the right to change something one those equipments. NOC

has also the ability to adapt reports to the need of the management; that should help for taking

the right decision.

1.3 TROUBLESHOOT

When an incident happens and the source is not know by anybody, NOC has to check and find

the problem, and when it’s found, they can fix that if they have the authority or transfer it to the

right service or person in charge of that task.

12/03/2008 Page 4

Page 5: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

2. FUNCTIONS

NOC has a huge part of functionalities around the IT system. Their main functions are to:

Level 1:

- Performance monitoring

- Status monitoring

- Policy monitoring

- Incident management

- Open, update, and close trouble tickets.

- Periodical activities Reporting

- Documentation

- Other duties as assigned

Level 2

- Monitor data communications networks to ensure that networks are available to all

system users.

- Monitor Datacenter infrastructures

- Resolve and document data communications problems.

- Develop and follow troubleshooting procedures in an effort to resolve problems.

- Contact users to correct and maintain network operations.

- Escalate problems as needed to engineering staff.

- Records daily network statistics.

- Open, update, and close trouble tickets.

- Update documentation to record new equipment installed, new sites, and changes

to configurations.

- Coordinate installation of communications equipment.

- Install communications equipment.

- Schedule operations on IT Facilities

- Quality insurance

- Other duties as assigned

12/03/2008 Page 5

Page 6: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

According to those functions, NOC is working as a back-end team to solve problem that

concern a site, a building or the global network. End user problems are taken care by HelpDesk.

2.1 PERFORMANCE MONITORING

Network equipments have to be followed every time. NOC has to check how resources are

used during a day. If there are some alarms (high consumption of CPU, Memory, etc.), NOC

team has to notify the problem.

2.2 STATUS MONITORING

NOC team has to follow the status of each equipment installed on the network. If it’s down or

unreachable, it has to notify that incident as soon as possible by following the procedure of

escalation.

2.3 ALERT MANAGEMENT

NOC team has to alert when an incident occurs. If it cannot solve the problem, the escalation

has to be done as soon as possible according to the contract with suppliers, or send the

incident to the Network team to solve if it’s internal.

2.4 POLICY MONITORING

To define in next version

2.5 QUALITY INSURANCE

To define in next version

2.6 REPORTING

NOC team has to produce weekly and monthly reports about performance and status

monitoring of network equipments

NOC team has to produce KPI monthly reports

12/03/2008 Page 6

Page 7: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

We can summarize it:

- Weekly reports:

o Availability of the network

- Monthly reports:

o KPI report of the network

o Availability of the entire network, each MAN/WAN Operator links

2.7 SCHEDULE

According to NOC team tasks, we have to put in place a NOC schedule that should be based in

02 teams working in round robbing.

This is the time:

- Group 01: from 07H00 to 15H30 with break from 12H00 to 13H00

- Group 02: from 10H00 to 18H30 with break from 13H00 to 14H00

- During the month, the two groups should work alternatively:

o Group 01: 1st and 3rd weeks

o Group 02: 2nd and 4th weeks

- On Saturday, the group which starts at 07H00 should work from 08H00 to 14H00

-

2.8 DOCUMENTATION

Each incident has to be documented. Each action made by suppliers or employee has to have a

report of intervention.

2.9 PREREQUISITES:

LEVEL 1:

Network Operations Center Knowledge on network operations. Knowledge of layer data communications protocols. Previous experience with tools used in monitoring the Ability to troubleshoot network problems effectively in a network operations environment.

12/03/2008 Page 7

Page 8: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

Maintain a broad knowledge of all products, service and NOC procedures. Strong interpersonal, verbal, and written communication skills. Excellent organizational, multitasking, prioritizing, and teamwork skills. Ability to work independently with little supervision. Ability to qualify for security clearance.

LEVEL 2:

Network Operations Center Knowledge on CISCO routers and switches, VSAT network operations. Knowledge of layer data communications protocols. Ability to verify that switches and routers as well as their configured network services and protocols,

operate as intended within a given network specification. Previous experience with tools used in monitoring the network including datacenter management

tools Ability to troubleshoot network problems effectively in a network operations environment. Maintain a broad knowledge of all products, service and NOC procedures. Strong interpersonal, verbal, and written communication skills. Excellent organizational, multitasking, prioritizing, and teamwork skills. Ability to work independently with little supervision. Ability to qualify for security clearance.

12/03/2008 Page 8

Page 9: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

3. ESCALATION LEVEL BY CARRIER

Escalation is done according to the present contracts signed between the two parts. Escalation

is maintained until the incident is closed.

3.1 MTN

HOTLINE number to call for incidents: 7126

Level Delay Contact

1 Immediately NOC MTN

2 ½ hour NOC Coordinator

3 1 hourSolution & Service Support Manager

Account Manager

4 2 hoursSenior Operations Manager

Senior Manager Corporate Sales

5 4 hours Chief Technical Officer

Following the persons in place included in escalation list:

Contact Persons Cellular E-mail

NOC MTNNetwork Operations

Center79 00 92 13 [email protected]

NOC Coordinator Armand Pichele 77 55 04 61 [email protected]

Solution and Service

Support ManagerSamuel PII 77 55 02 58 [email protected]

Account Manager Augustin MIAFFO 77 55 03 51 [email protected]

Senior Operations

Manager

Pierre Paul

BISSOMBI77 55 10 97 [email protected]

Senior Manager

Corporate SalesAlain MORE 77 55 05 13 [email protected]

Chief Technical

OfficerGilbert NGONO 77 55 10 01 [email protected]

12/03/2008 Page 9

Page 10: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

3.2 ORANGE

HOTLINE number to call for incidents : 96 40 04 00

Level Delay Contact

1 Immediately NOC ORANGE

2 4 hoursResponsable infrastructure

Directeur des Opérations

3More than 4 hours /

critical incident

Senior Project Manager

Directeur OCMS

DGA en Charge technique et administratif

Following the persons in place included in escalation list:

Contact Persons Cellular E-mail

NOC ORANGENetwork

Operations Center96 40 04 00 [email protected]

Responsable

infrastructureMartin BIYICK 99 94 98 81 [email protected]

Directeur des

OpérationsSerge NAFTEUR 99 94 28 38 [email protected]

Senior Project

Manager

Daniel Parfait

NLEND99 94 12 20 [email protected]

Directeur OCMSJean Michel

CANTO99 94 01 04 [email protected]

DGA en Charge

technique et

administratif

Alain MARQUIS 99 94 08 08 [email protected]

12/03/2008 Page 10

Page 11: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

3.3 CAMTEL

HOTLINE number to call for incidents: ?

Level Delay Contact

1a

1b

1c

Immediately

Salle d’exploitation SAT3

Service Internet Douala

Service Internet Yaoundé

2 1 hour Responsable technique littoral

3 2 hours Division service après-vente

4 4 hours Gestionnaire du compte AES

5 6 hours Responsable commerciale

Following the persons in place included in escalation list:

Contact Persons Cellular E-mail

Salle d’exploitation SAT3DONFACK

LOWE Bertin33410755

[email protected]

Service Internet DoualaAKOUA

Anicet22 02 04 53

Service Internet YaoundéKAMA

Bienvenu22 00 73 33

Reponsable technique

LittoralEYOUM 33 02 13 55

Division service après-

vente

KOUADIO

Chantal22 02 01 12

Gestionnaire du compte

AES

NJI AWA

Mathias33 00 30 03

Responsable

commerciale22 00 12 91

12/03/2008 Page 11

Page 12: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

3.4 VSAT NETWORK (TO BE DETERMINED)

...

12/03/2008 Page 12

Page 13: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

4. ESCALATION LEVEL - AES SONEL

NOC number : 5777

Level Delay Contact

1 ImmediatelyNOC AES SONEL

2 ½ hourNetwork Team ; Infrastructures team ;

Application team

3 3/4 hour

Chef division ; Sous-directeur de

l’infrastructure ; Chef de division

Infrastructures et Réseaux

4 1 hour DSI

Following the persons in place included in escalation list:

Contact PersonnesCellulair

eE-mail

NOC AES

SONEL

NOC

Sylvain BITHE

Theodore BELL

Nicolas TONGO

MOUSSONGO

[email protected]

[email protected]

[email protected]

[email protected]

Network Team

Gabriel OYONO

Felix NGOH

Daniel Claude MOFEN

Sidonie MBWANG

[email protected]

[email protected]

Infrastructures

Team

Camille KITE

Christian N. AWOMO E.

Aimé Claude TAMPO’O

[email protected]

[email protected]

[email protected]

12/03/2008 Page 13

Page 14: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

Chef Service

ITSC

DOOH EDIBE EWANDJE

Joseph Moïse

[email protected]

Application

Team LeaderChristian NOLA ZE

[email protected]

Network and

Infra Team

Leader

Bernard FANGA [email protected]

Sous-directeur Jean Louis [email protected]

DSI

12/03/2008 Page 14

Page 15: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

5. ESCALATION PROCESS – NOC AES SONEL

QUI QUAND COMMENT

NOC

When the link WAN/MAN is unreachable

½ hour after the incident occurs

Every hour the incident is not solved

Stop the alert

12/03/2008 Page 15

Page 16: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

6. INCIDENT MANAGEMENT Incident are opened every event occur – when we have one link which goes down. Information

needs for managing the incident are:

- Incident number identifier

- Date and hour the incident occurs

- Type of incident

- Description of the incident

- Multiple interventions to solve the incident (TSP & ISP / AES SONEL)

- Date and hour the incident is closed

- Time passed from the incident detection till the resolution

- Optionally, the copy of the work permit for intervention

- Reporting of the troubleshooting operations carried out

At the end of each week and each month, we can get a report about how many incidents are

opened, closed. They are useful for statistics and analysis of reaction when an even occurs – Carrier

and AES SONEL. They can also be used as penalties for payments.

These are reports that can be useful:

- Opened incidents, closed incidents, pourcentage of incident solved

- Incidents opened and closed by provider, with min, max and average times to solve the

incident

- Incidents opened and closed by level of escalation

7. NETWORK TROUBLESHOOTING OVERVIEW

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally. As a network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). You also continually evaluate and improve your network's performance. Because serious networking problems can sometimes begin as performance problems, paying attention to performance can help you address issues before they become serious.

12/03/2008 Page 16

Page 17: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

7.1.1 About Connectivity Problems

Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN). Using management tools, you can often fix a connectivity problem before users even notice it. Connectivity problems include:

Loss of connectivity - When users cannot access areas of your network, your organization's effectiveness is impaired. Immediately correct any connectivity breaks.

Intermittent connectivity - Although users have access to network resources some of the time, they are still facing periods of downtime. Intermittent connectivity problems can indicate that your network is on the verge of a major break. If connectivity is erratic, investigate the problem immediately.

Timeout problems - Timeouts cause loss of connectivity, but are often associated with poor network performance.

7.1.2 About Performance Problems

Your network has performance problems when it is not operating as effectively as it should. For example, response times may be slow, the network may not be as reliable as usual, and users may be complaining that it takes them longer to do their work. Some performance problems are intermittent, such as instances of duplicate addresses. Other problems can indicate a growing strain on your network, such as consistently high utilization rates.

If you regularly examine your network for performance problems, you can extend the usefulness of your existing network configuration and plan network enhancements, instead of waiting for a performance problem to adversely affect the users' productivity.

7.1.3 Solving Connectivity and Performance Problems

When you troubleshoot your network, you employ tools and knowledge already at your disposal. With an in-depth understanding of your network, you can use network software tools, such as "Ping", and network devices, such as "NMS", to locate problems, and then make corrections, such as swapping equipment or reconfiguring segments, based on your analysis.

So you can:

Baseline the network's normal status to use as a basis for comparison when the network operates abnormally

Precisely monitor network events Be notified immediately of critical problems on your network, such as a device losing connectivity

12/03/2008 Page 17

Page 18: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

Establish alert thresholds to warn you of potential problems that you can correct before they affect your network

Resolve problems by disabling ports or reconfiguring devices

8. TROUBLESHOOTING STRATEGY

If you notice changes on your network, ask the following questions:

Is the change expected or unusual? Has this event ever occurred before? Does the change involve a device or network path for which you already have a backup solution in

place? Does the change interfere with vital network operations? Does the change affect one or many devices or network paths?

After you have an idea of how the change is affecting your network, you can categorize it as critical or noncritical. Both of these categories need resolution (except for changes that are one-time occurrences); the difference between the categories is the time that you have to fix the problem.

By using a strategy for network troubleshooting, it is possible to approach a problem methodically and resolve it with minimal disruption to network users. It is also important to have an accurate and detailed map of your current network environment. Beyond that, a good approach to problem resolution is:

Recognition Symptoms Understanding the Problem Identifying and Testing the Cause of the Problem Solving the Problem

8.1 RECOGNITION OF SYMPTOMS

The first step to resolving any problem is to identify and interpret the symptoms. You may discover network problems in several ways. Users may complain that the network seems slow or that they cannot connect to a server. You may pass your network management station and notice that a node icon is red. Your beeper may go off and display the message: WAN connection down.

8.1.1.1 User Comments

Although you can often solve networking problems before users notice a change in their environment, you invariably get feedback from your users about how the network is running, such as:

12/03/2008 Page 18

Page 19: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

They cannot print. They cannot access the application server. It takes them much longer to copy files across the network than it usually does. They cannot log on to a remote server. When they send e-mail to another site, they get a routing error message. Their system freezes whenever they try to Telnet.

8.1.1.2 Network Management Software Alerts

Network management software, as described in "Your Network Troubleshooting Toolbox", can alert you to areas of your network that need attention. For example:

The application displays red (Warning) icons. Your weekly Top-N utilization report (which indicates the 10 ports with the highest utilization rates)

shows that one port is experiencing much higher utilization levels than normal. You receive an e-mail message from your network management station that the threshold for

broadcast and multicast packets has been exceeded.

These signs usually provide additional information about the problem, allowing you to focus on the right area.

8.1.1.3 Analyzing Symptoms

When a symptom occurs, ask yourself these types of questions to narrow the location of the problem and to get more data for analysis:

To what degree is the network not acting normally (for example, does it now take one minute to perform a task that normally takes five seconds)?

On what subnetwork is the user located? Is the user trying to reach a server, end station, or printer on the same subnetwork or on a different

subnetwork? Are many users complaining that the network is operating slowly or that a specific network

application is operating slowly? Are many users reporting network logon failures? Are the problems intermittent? For example, some files may print with no problems, while other

printing attempts generate error messages, make users lose their connections, and cause systems to freeze.

8.2 UNDERSTANDING THE PROBLEM

Networks are designed to move data from a transmitting device to a receiving device. When communication becomes problematic, you must determine why data are not

12/03/2008 Page 19

Page 20: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

traveling as expected and then find a solution. The two most common causes for data not moving reliably from source to destination are:

The physical connection breaks (that is, a cable is unplugged or broken). A network device is not working properly and cannot send or receive some or all data.

Network management software can easily locate and report a physical connection break (layer 1 problem). It is more difficult to determine why a network device is not working as expected, which is often related to a layer 2 or a layer 3 problem.

To determine why a network device is not working properly, look first for:

Valid service - Is the device configured properly for the type of service it is supposed to provide? For example, has Quality of Service (QoS), which is the definition of the transmission parameters, been established?

Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted? For example, is a firewall set up that prevents that device from accessing certain network resources?

Correct configuration - Is there a misconfiguration of IP address, subnet mask, gateway, or broadcast address? Network problems are commonly caused by misconfiguration of newly connected or configured devices.

8.3 IDENTIFYING AND TESTING THE CAUSE OF THE PROBLEM

After you develop a theory about the cause of the problem, test your theory. The test must conclusively prove or disprove your theory.

Two general rules of troubleshooting are:

If you cannot reproduce a problem, then no problem exists unless it happens again on its own. If the problem is intermittent and you cannot replicate it, you can configure your network

management software to catch the event in progress.

Although network management tools can provide a great deal of information about problems and their general location, you may still need to swap equipment or replace components of your network until you locate the exact trouble spot.

12/03/2008 Page 20

Page 21: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

After you test your theory, either fix the problem as described in "Solving the Problem" or develop another theory.

8.3.1.1 Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident.

On your network, a user cannot access the mail server. You need to establish two areas of information:

What you know - In this case, the user's workstation cannot communicate with the mail server. What you do not know and need to test -

Can the workstation communicate with the network at all, or is the problem limited to communication with the server? Test by sending a "Ping" or by connecting to other devices.

Is the workstation the only device that is unable to communicate with the server, or do other workstations have the same problem? Test connectivity at other workstations.

If other workstations cannot communicate with the server, can they communicate with other network devices? Again, test the connectivity.

The analysis process follows these steps:

1 .   Can the workstation communicate with any other device on the subnetwork?

If no, then go to step 2. If yes, determine if only the server is unreachable.

If only the server cannot be reached, this suggests a server problem. Confirm by doing step 2.

If other devices cannot be reached, this suggests a connectivity problem in the network. Confirm by doing step 3.

2 .   Can other workstations communicate with the server?

If no, then most likely it is a server problem. Go to step 3. If yes, then the problem is that the workstation is not communicating with the subnetwork. (This

situation can be caused by workstation issues or a network issue with that specific station.)

3 .   Can other workstations communicate with other network devices?

If no, then the problem is likely a network problem. If yes, the problem is likely a server problem.

When you determine whether the problem is with the server, subnetwork, or workstation, you can further analyze the problem, as follows:

12/03/2008 Page 21

Page 22: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

For a problem with the server - Examine whether the server is running, if it is properly connected to the network, and if it is configured appropriately.

For a problem with the subnetwork - Examine any device on the path between the users and the server.

For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server.

8.3.1.2 Equipment for Testing

To help identify and test the cause of problems, have available:

A laptop computer that is loaded with a terminal emulator, TCP/IP stack, TFTP server, CD-ROM drive (to read the online documentation), and some key network management applications. With the laptop computer, you can plug into any subnetwork to gather and analyze data about the segment.

A spare managed hub to swap for any hub that does not have management. Swapping in a managed hub allows you to quickly spot which port is generating the errors.

A single port probe to insert in the network if you are having a problem where you do not have management capability.

Console cables for each type of connector, labeled and stored in a secure place.

8.3.2 Solving the Problem

Many device or network problems are straightforward to resolve, but others yield misleading symptoms. If one solution does not work, continue with another.

A solution often involves:

Upgrading software or hardware (for example, upgrading to a new version of agent software or installing Gigabit Ethernet devices)

Balancing your network load by analyzing: What users communicate with which servers What the user traffic levels are in different segments

Based on these findings, you can decide how to redistribute network traffic.

Adding segments to your LAN (for example, adding a new switch where utilization is continually high)

Replacing faulty equipment (for example, replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems, have available:

Spare hardware equipment (such as modules and power supplies), especially for your critical devices

12/03/2008 Page 22

Page 23: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

9. SERVER MONITORING POLICY

OVERVIEWThis server monitoring policy is an internal IT policy and defines the monitoring of servers in the organization for both security and performance issues.

PURPOSEThis policy is designed both to protect the the organization against loss of service by providing minimum requirements for monitoring servers. It provides for monitoring servers for file space and performance issues to prevent system failure or loss of service.

SCOPEThis policy applies to all production servers and infrastructure support servers including but not limited to the following types of servers:

1. File servers 2. Database servers 3. Mail servers 4. Web servers 5. Application servers 6. Domain controllers 7. FTP servers 8. DNS servers

DAILY CHECKINGAll servers shall be checked manually on a daily basis the following items shall be checked and recorded:

1. The amount of free space on each drive shall be recorded in a server log. 2. Services shall be checked to determine whether any services have failed. 3. The status of backup of files or system information for the server shall be checked daily.

EXTERNAL CHECKSEssential servers shall be checked using either a separate computer from the ones being monitored or a server monitoring service. The external monitoring service shall have the ability to notify multiple IP personnel when a service is found to have failed. Servers to be monitored externally include:

1. The mail server

12/03/2008 Page 23

Page 24: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

2. The web server 3. External DNS servers 4. Externally used application servers. 5. Database or file servers supporting externally used application servers or web servers.

12/03/2008 Page 24

Page 25: NOC Processes Roles - English

NETWORK OPERATING CENTER

CHECK LIST AND PROCEDURES

12/03/2008 Page 25

CATEGORIES ACTIONS OBSERVATIONS STATUTS

Liens Réseaux

Noter les statistiques via les dashboards des NMS    Prendre connaissance des resultats du derniers Sites testing

   

Verifier et mettre à jour la liste des Incidents en cours de resolution

   

Etablir le rapport de disponibilité via Nagios pour MTN, OCMS, Camtel, AES

   

Communiquer les informations aux équipes connexes

   

     

Infrastructures

Verifier l'état des UPS via Nagios ou Centreon    Verifier l'état des groupes électrogènes    Verifier l'état des Servers    Verifier l'état des systèmes de refroidissement    Vérifier l'intégrité des équipement Réseaux (Router, Switch, Firewall, …)

   

     

Services

SAGE 1000    BSA    ORACLE DB    xSQL DB    CITRIX    INTERNET WEB ACCESS    Outlook Web Access    AES SONEL CONTACT    All others Internal Portal    DHCP    DNS