Cdiscount presentation

13
Bigger Hotter F aster Stronger Get lucky with VHA and MSE

Transcript of Cdiscount presentation

Bigger Hotter

Faster Stronger

Get lucky with VHA and MSE

2

Cdiscount in numbers

1M Average unique visitors per day

152K Parcels shipped in one day

x3 For Christmas

x5 On 1st day of January sales

5000 HTTP requests per sec

1605M€

Total turnover in 2014

13M Products online

23M Active marketplace offers

3

History of caching

2008 CDN 50K objects

2011

Load-balancer

500K objects

2013

Varnish

10M objects

2015 VHA & MSE

35M objects

4

SEO Team needed to scale to millions of objects

Sales Team needed cache to be compatible with A/B Testing

SEO Team wanted long TTL

Sales team wanted short TTLs (didn't I write long TTLs somewhere above ???)

SEO Team required amazing response times and hit ratio

Sales team… didn't really care (though they should…)

SEO Team wanted greater flexibility

Sales Team preferred to not use cache !

Why did we implement Varnish ?

5

Deploy 3 servers per DC with 128GB RAM each • Using malloc ! ~ 300GB available cache on each DC • URL-based sharding

Include device type and AB Test cookie in the cache key • Thanks to the regsub() function

Develop a warmup tool • Use "hash_always_miss" in VCL to refresh the cache • Crawls 8M pages every night

Implement specific TTLs for bots

What did we do first ?

6

Varnish

LB LBSharding

PCTabletPhone

DC1 DC2

Backend PCTabletPhone

What did we do first ?

7

" Response times divided by 2 for Googlebot

" 6M cache objects on each server

# 2 distinct caches (DC1 and DC2)

# Big impact of losing one cache server • 1/3 of all cache objects lost • Long time to fill up the cache when it's back online

# Bad cache keys due to regexp matching of cookies

# Duplicate objects created by hash_always_miss

# We needed to scale to more than 30M objects

Faster, but…

8

So what did we do next ?

Varnish Edge (L1)

Varnish Core (L2)

VHA

LB LB

Sharding

PCTabletPhone

DC1 DC2

Backend PCTabletPhone

Varnish Edge (L1)

Varnish Core (L2)

VHA

LB LB

Sharding

PCTabletPhone

DC1 DC2

Backend PCTabletPhone

9

So what did we do next ?

We introduced 2 levels of cache Level1 : 4 servers with 512GB RAM in each DC

• malloc ! 1.4TB cache available

Level 2 : 2 servers with 2TB SSD in each DC • Massive Storage Engine (MSE) ! 2TB cache • Varnish High Availability (VHA)

Switched to vmod_cookie for better cookie handling • And added two cookies in cache key (it was too easy…)

Removed hash_always_miss, reduced TTLs and increased Grace

10

Our warmup could kill the web servers • Varnish is too fast and efficient when in Grace !

MSE runs just fine, but be careful with memory usage • Millions objects in cache means GBytes of RAM for metadata…

Cookies here, crashes there… • Vmod_cookie crashed due to big cookies • VHA synced incomplete objects due to… big cookies

• Varnish crashed due to corrupt headers, coming from… guess what ? ☺

But it was not easy !

11

Fixed vmod_cookie

Enhanced VHA according to MY needs ! • 1-to-many • Ignore VHA's own requests to prevent looping • Unified configuration file and generated VCL • And more to come !!

Helped us troubleshoot the crashes and tune varnish • Be aware of the default vsl_reclen / vsl_buffer limits

Then came the Devops from Varnish Software !

12

One large global cache synchronized accross the DCs

Up to 40M objects in cache

The loss of one (or several) L1 servers has no impact

The loss of one L2 server has no impact

Warmup speed divided by 2 (keeping the same efficiency)

Response times divided by 2 again for Googlebot

And Guillaume is my new best friend !! ☺ (but he might think the opposite…)

Now it's bigger, hotter, stronger, and even faster !

13

Thanks for your attention

?