Compsac2010 malik

Post on 12-Apr-2017

107 views 0 download

Transcript of Compsac2010 malik

Using Load Test to Automatically Compare the Subsystems of a Large Enterprise

SystemHaroon Malik, Bram Adams & Ahmed E. Hassan

Software Analysis and Intelligence Lab (SAIL)Queen’s University, Kingston, Canada

Parminder Flora & Gilbert Hamann Performance Engineering

Research In Motion, Waterloo, Canada

Today's Large scale systems (LSS) are composed of many underlying subsystems.

These LSS grow rapidly in size to handle growing traffic, complex services and business critical functionality

Performance analyst have to face the challenge of dealing with performance bugs as processing is spread across thousands of subsystems and mail lion of hardware nodes

LOAD TESTING

LOAD TESTING

Load Generator-1

Load Generator-2

Monitoring Tool

Performance counter Log

Performance Repository

System

Environment Setup Load test execution Load test analysis Report generation

CURRENT PRACTICE

1 2 3 4

CHALLENGES…

LARGE NUMBER OF PERFORMANCE COUNTERS

LIMITED TIME

RISK OF ERROR

2 + 2 = 5

Automated

Methodology

Required

homeLikeJustWorkNowreally

:::::::coolcpt

Just man Work home smash lunch day pretty beer ready working home day smash pretty

Time getting get well dude dinner bucket head really heading got time night get dude got

Feeling matt dude last 4560 ut2465 like now still good feel still next might game today 4562

PC-1

PC-2

PC-3

Lot of Data Our Methodology Signature

METHODOLOGY

homeLikeJustWorkNowreally

:::::::coolcpt

Just man Work home smash lunch day pretty beer ready working home day smash pretty

Time getting get well dude dinner bucket head really heading got time night get dude got

Feeling matt dude last 4560 ut2465 like now still good feel still next might game today 4562

PC-1

PC-2

PC-3

Lot of Data Our Methodology Signature

METHODOLOGYDatabase

Mail Web

METHODOLOGY

Commits/Sec

Writes/Sec

CPU Utilization

Database Cache % Hit

Subsystems Base-Line Load Test - 1 DeviationMatch

0.59

1

0.99

METHODOLOGY STEPS

1 2 3 4 5 6

Data Preparation

Counter Normalization

Dimension Reduction

Crafting Performance Signatures

Extracting Performance Deviations

Report Generation

CASE STUDY

MEASURING THE PERFORMANCE

Base- Line

Test- 1t1 t2 t3 t4 t5 t6

Deviations Predicted (P)

Deviations Occurred (O)

PO= P ∩ O

Precision = P ∩ O/ P = 1/4 = 0.25

Recall = P ∩ O/ O = 1/3 = 0.33

RESEARCH QUESTIONS

Can our methodology identify the subsystems of an LSS, which have performance deviations relative to prior tests?

Can we save time on the unnecessary load test completion by early identifying the performance deviations along different subsystems of a LSS?

How is the performance of our methodology affected by different sampling intervals?

Can our methodology identify the subsystems of an LSS, which have performance deviations relative to prior tests?

RQ-1

APPROACH

4 Load tests 8 hours700 performance counters eachMonitoring interval 15 sec 1922 instances

Baseline test 85% data reductionTest-1 Baseline test reproductionTest-2 Synthetic fault injection via mutationTest-3 Increased the work load intensity (8X)

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Base Line Test Test-A Synthesized Test 8X- Load

Performance Counters

impo

rtan

ce

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1Web Server- A

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1Application System

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 180.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1Web Server- B

Database

FINDINGS

Our methodology help performance analysts to identify sub-systems with performance deviations relative to prior tests

SubsystemsLoad Test

Test-A Synthesized 8-X loadData Base 0.997 0.732 0.826Web Server-A 1.000 0.701 0.795Web Server-B 1.000 0.700 0.790Application 1.000 0.623 0.681

Can we save time on the unnecessary load test completion by early identifying the performance deviations along different subsystems of a LSS?

RQ-2

1 33 65 97 129

161

193

225

257

289

321

353

385

417

449

481

513

545

577

609

641

673

705

737

769

801

833

865

897

929

961

99335

40

45

50

55

60

65

70

75

80

% CPU Utilization

Observations

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41

35

40

45

50

55

60

65

70

75

80

% CPU Utilization

Observations

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Baseline

Load TestCPU Stress

0 20 40 60 80 100 12038

88 % CPU Utilization

Time (min)

Two Load Test 2 hours, each Monitoring rate – 15 secCPU stress on database server at the 60th min for 15 sec. Test comparison Removed 12% sample - 10 min

6% 6%

APPROACH

Database(30-mins)

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Base-Line Test Load Test

Database(15-mins)

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1 Database(10-mins)

1 2 3 4 5 6 7 8 9 10 110.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1 Database(5-mins)

Performance Counters

impo

rtan

ce

FINDINGSTime-(Observations) Database30-mins (120) 115-mins ( 60) 110-mins (40) 0.98935-mins (20) 0.8255

Early identification of deviations within 10 minutes or 40 Observations

How is the performance of our methodology affected by different sampling intervals?

RQ-3

Two Load Test 2 hours, Each Monitoring rate– 15 sec Fault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval

30 min – 4 Samples 15 min – 8 Samples

Baseline

Load Test -1

30-min

APPROACH

Baseline

Load Test -1

30-min

Two Load Test 2 hours, Each Monitoring rate– 15 secFault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval

30 min – 4 Samples 15 min – 8 Samples

APPROACH

Baseline

Load Test -130-min

Two Load Test 2 hours, Each Monitoring rate– 15 secFault Stopped Load Generators 10 Times- 15 sec each Measured the performance of methodology at different time interval

30 min – 4 Samples 15 min – 8 Samples

15-min

APPROACH

Small sample yield high RECALL

FINDINGS

Test Run Database Web Server -1 Web Server- 2 Application System Average  

Min Obs Samples Recall Prec Recall Prec Recall Prec Recall Prec Recall Prec30 120 4 0.50 1.00 0.50 1.00 0.30 1.00 0.25 1.00 0.325 1.000  

15 60 8 0.62 1.00 0.62 1.0 0.62 1.0 0.50 1.0 0.590 1.000  

10 40 12 1.00 0.90 1.00 0.9 1.00 0.9 0.9 0.69 0.975 0.847  

5 20 24 1.00 0.70 1.00 0.7 1.00 0.8 1.00 0.66 1.000 0.715  

All - 0.78 0.90 0.78 0.90 0.73 0.92 0.66 0.83 0.738 0.890  

Large sample yield high PRECISION

Methodology performs best at 10 minutes time interval with nice balance of both recall and precision