trying to improve OpenSim performance under heavy load

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

trying to improve OpenSim performance under heavy load

Dr Scofield
those of you that are following the SVN commit mails on opensim-dev  will have noticed that i just checked in a bunch of patches.

we recently did stress tests using (a) a bunch of modified pCampBot (modified so that they'd request every texture in sight), (b) a group of volunteers (30-40 clients), (c) an even greedier version of (a).

stress test (a) looked fairly good (http://xyzzyxyzzy.net/2009/06/09/81/), so we optimistically scheduled (b).

stress test (b) was a big disappointment (http://xyzzyxyzzy.net/2009/06/19/21/), so we went back to the drawing board and worked on the bots to become more like real avatars and also started looking at how we handle the traffic with the client.

the reworked bots allowed us to recreate the scenario where we'd essentially render the system unusable once we got past the 20 avatar limit.

we've concentrated on the following areas (so far):
  1. texture sending
  2. LLPacketHandler/LLUDPServer
  3. XEngine
that last one, XEngine, was an easy fix: the advertised behaviour in OpenSim.ini.example for idle timeout does not describe what's inside the can, to quote alan webb:
The XEngine secition of the example ini says that the
timeout for an iden thread is in seconds, and an example value
of 60 is specified. In fact, this is actually resulting in a 60mS
idle timeout, which is not normally enough for a smart thread
to survive. I have added a multiplier to the XEngine constructor
so that the number now matches the published behavior.
texture sending was a bit more tricky, again to quote alan:
This change moves texture send processing out of the main packet processing loop and moves it to a timer based processing cycle.

Texture packets are sent to the client consistently over time. The timer is discontinued whenever there are no textures to transmit.

The behavior of the texture sending mechanism is controlled by three variables in the LLCLient section of the config file:

 [1] TextureRequestRate (mS) determines how many times per second texture send processing will occur. The default is 100mS.
 [2] TextureSendLimit determines how many different textures will be considered on each cycle. Textures are selected by priority. The old mechanism specified a value of 10 for this parameter and this is the default
 [3] TextureDataLimit determines how many packets will be sent for each of the selected textures. The old mechanism specified a value of 5, so this is the default.

So the net effect is that TextureSendLimit*TextureDataLimit packets will be sent every TextureRequestRate mS.

Once we have gotten a reasonable feeling for how these parameters affect overall processing, it would be nice to automagically manage these values using information about the current status of the region and network.
finally, LLPacketHandler and LLUDPServer: while looking (staring?) at the traces we had collected we realized that we might have an issue with the incoming UDP buffer --- checking the specs of both .NET and mono we found out that .NET uses a default socket buffer size of 8K and mono whatever the underlying linux OS was using, which for ubuntu 08.04 and SLES11 turned out to be 111K. both buffer sizes are not really that much (.NET's is a bit of a joke, IMHO) and checking the network stats we did see that the OS was dropping a considerable number of UDP datagrams. so, we added an option to set the receive socket buffer size for LLUDPServer (and also changed the max receive buffer size allowed by the OS via sysctl). setting it to a sufficiently large value got rid of the dropped UDP datagrams.

looking at LLPacketHandler we realized that while we were re-using packets (via the packet pool) we were still doing a LOT of List<LLQueItem>(existing List<LLQueItem>) and object copying for stuff like ack sending/resending, dropping packets, etc. it turns out that we can in all cases accomplish the task without those List instantiations and copy operations.

going back to our reworked, greedy bots we now got way beyond the 20 avatar mark (repeated 30-40 greedy bots) and, though the system got laggy and slow, we could still move around with our observer avatars and use our in-world tools.

so, the changes to texture sending, LLUDPServer, and the streamlining of all those foreach(...) loops seems to be going in the right direction.

    cheers,
    DrS/dirk

-- 
dr dirk husemann ---- virtual worlds research ---- ibm zurich research lab
SL: dr scofield ---- [hidden email] ---- http://xyzzyxyzzy.net/
RL: [hidden email] - +41 44 724 8573 - http://www.zurich.ibm.com/~hud/

_______________________________________________
Opensim-users mailing list
[hidden email]
https://lists.berlios.de/mailman/listinfo/opensim-users