As an afterthought someone decided at the last minute, that maybe
the architect (me) should be on the architectural review of a product.
Normally for social networking web development, I allow for a little
short term inconsistency. This is because only one user has access to
modify a thing and that user isn’t likely to do two things at the same
time. Because of this, concurrency is almost never a problem and. even
if the data gets clobbered, the database at least is consistent and
your objects are quickly fixed.
The problem with this particular project is that since a paid good
is involved and many users will race to the same data
store—inconsistencies can occur and they’d be more harmful than a goto
statement. The solution proposed was to build a Java service to keep
these eight pieces of data consistent. There was also a release plan in
order to estimate the resource allocation for the new service under
live site load.
Though late to the meeting, I opened my mouth and said, “You don’t
need a Java service to do this. You can do it all in PHP and memcache.”
Why they didn’t think it was possible
Long before I joined the company, there was a system to prevent
stampeding by having a lock key in memcache. This didn’t work out so
well.
Stampeding is what occurs have 100,000 concurrent users and a
memcache key for a fairly popular piece of data (say the block list for
your web application, or the ad unit for the banner ads) expires
(because of a version increment or an expiration). Tons of concurrent
processes will see the data is missing and will stampede the database
with the same request. Databases are slow—which is why we have memcache
in the first place—and your site experiences a very nasty hiccup every
time this happens.
One problem was that it is buggy. It used a nonce/semaphore to look a lot
.
Well like that but if written by someone who just learned
object-oriented program, then the locking code was added by someone
else who had just read a design patterns book and then the bugs were
closed by someone else who was so lazy that they prefer to patch
problems on the user interface layer. It looked that way because it was
written that way.
I always say that the sites the founders worked on before this had a business plan that was the internet-equivalent of
.
Our codebase reflected that attitude.
So the reason we got burned by concurrency issues in memcache wasn’t
because that it’s fundamentally broken, it was simply because it was
easier to rewrite than to pretend that jenga-ing
this stuff with a patch on a patch on a patch was going to magically make the code more stable.
And the reason I removed it in the rewrite wasn’t because that it’s
that difficult to write, but rather that we were using this locking for
every single object that we stored in memcache, even the less busy
ones. Anything built in would be abused similarly. Better to not give
the developers any rope they can use to hang the site with.
(In the case of stampedes, I figured the key is very likely to be
too “hot” anyway. Instead I built a system to allow easy storage of hot
keys in the shared memory cache of the webserver, instead of using
memcache and needing a network call.)
What was the bug?
None of this is typically understood by anyone else. The reason is
simply that the average engineer has been working here 1/20th the time
I have. And those that have been working here half the time—pretty much
all the rest—only know that I pulled the locking code out.
The bug in our code (and the above link) is that there is a race gap between the memcache::get()
and memcache::set()
.
This is perfectly fine if all you want to do is prevent a stampede
since only a few, even on a slow system like PHP, would be in the
winners circle of that race. This sort of thing is really bad in the
above case.
So what is the solution? The solution is to use a memcache command that is more atomic. The one that fits the bill is memcache::add()
which, if a key is already added returns a failure condition.
The codez
For you script kiddies, here is the code. I don’t know if it works since I wrote it without any unit testing
.
35
|
function
locked_memcache_update(
$memcache
,
$key
,
$updateFunction
,
$expiryTime
=3,
$waitUtime
=101,
$maxTries
=100000)
|
37
|
$lock
=
'lock:'
.
$key
;
|
41
|
for
(
$tries
=0;
$tries
<
$maxTries
; ++
$tries
) {
|
42
|
if
(
$memcache
->add(
$lock
,1,0,
$expiryTime
)) {
break
; }
|
45
|
if
(
$tries
==
$maxTries
) {
|
47
|
trigger_error(sprintf(
'Lock failed for key: %s'
,
$key
), E_USER_NOTICE);
|
51
|
while
(!
$memcache
->add(
$lock
,1,0,
$expiryTime
)) {
|
57
|
$data
=
$memcache
->get(
$key
,
$flag
);
|
58
|
call_user_func(
$updateFunction
,
$data
);
|
59
|
$memcache
->set(
$key
,
$data
,
$flag
);
|
62
|
$memcache
->
delete
(
$lock
,0);
|
(Yes, I this commenting is typical when I code, hope I could say the
same for you.) The reason it’s a function is so that an engineer has to
do work to use it. If they got it for free, they’d abuse it—or, at
least that was the worry.
If you need to do locking on the database, then you would have the
$updateFunction nest something that will handle a database update. You
might want to up the $expiryTime too, but you probably won’t need to—I
just chose 3 because Touge did in his original post.
from: http://terrychay.com/article/keeping-memcache-consistent.shtml