Welcome Board Archive - The recent latency issues

From LegendMUD
Jump to navigation Jump to search
( 5) From: Rufus         Title: The recent latency issues                
                         Posted On: Friday, June 08 2018, 07:45PM
---------------------------------------------------------------------------
The recent 'lag' problems

I'd rather not call them 'lag' ... the proper term is 'chunking.' 
It's actually the game stopping processing input/output while it 
works on something else. 

Legend runs on a pulse-based system where it does every main loop 
of the game every 250ms (4 times/second). When I added in the 'save 
people often' code, I determined this would be viable because 
we spent, on average, 216ms in 'sleep'. That means the mud was simply 
waiting for a set amount of time to pace (for those in the know, we 
don't use 'sleep()' because of its somewhat unpredictable nature, we 
have a more reliable algorithm).

When we added that code, our character list update -- which, just so 
you know, runs through 12,000+ players/mobs, 2500+ act updates, etc 
every quarter of a second -- was taking up no more than about 30ms 
on average. We had plenty of time and disk is fast now. Even your 
friendly local druid with a ton of herbs saves in a matter of a few 
miliseconds even though their pfiles are huge.

When someone would be removed from the game for being link-dead, it
takes a bigger hit, but even before this recent spat of chunking, 
it was rarely noticeable, taking < 45ms, and usually far, far less.

People started complaining about the lag on typing 'save'. I optimized 
some of that code but it was something else that had previously not 
really been a problem that suddenly became a problem, but there was 
an obvious potential slowdown of little value that I removed.

The chunking during the update cycle though is perplexing. 

We did change the compiler, but we'd not changed any code in the 
update or save paths in a while. One of the potential fixes (that 
didn't work) was reverting the compiler. 

I added a number of options to our internal profiler to help track 
this down. Thank you for your patience with the reboots! 

There are many places where we could do some optimization but the 
amount of work and amount of risk is substantial. Compare major 
alterations to this part of the code akin to open heart surgery. 

Personally, I run a virtual machine whose specs mirror, pretty closely,
the VPS we have the mud on. Even ramping up the pulse updates to 20/
second (50ms vs 250ms), the average cycle through the character list 
is 13ms. On the main mud it's 108ms. The peak on my testmud is 85, 
on the main mud right now, it's in excess of 6s.

Everything I've looked into points to environmental issues. We're 
going to explore down that road to see if we can't resolve this first 
as resolving this in code is a 4-6 month (full-time) project.

Thanks for sticking with us. Sorry about the issues!