Finding NT Memory Leaks
When a system is out of memory, any program that tries to do work will run extremely slowly, as it uses the disk drive swap file for memory. For example, beta test versions of
Microsoft's IIS have often leaked memory, so that eventually the system would slow to a crawl, and all programs would have trouble running, and use an excessive amount of CPU time. In almost all
cases, when Lyris uses all the CPU power of the machine, it is because the NT system is out of memory, because some other program has been leaking memory, and now there is no system memory left.
A few people have asked us about memory leaks and long-term stability on NT, so we thought we'd share what we've found.
One thing we've found is that NT is not very good at reporting which applications are leaking memory. Prior to Lyris 2.5 beta 2, Lyris slowly leaked memory over time, but none of NT's process
watching tools reported the "lyris" process as growing. As of Lyris 2.5 beta 2, Lyris no longer leaks memory.
What we found out is that the important number on Windows NT is the "committed bytes" that Performance Monitor reports. This is the total amount of memory being used on your system by all programs,
including (we think) the operating system.
When Lyris was leaking memory we found that the "committed bytes" would steadily rise, until NT would stop functioning reliably. When the "total commit" was very large, NT might report a "quota
limit" error, new programs would be kept from starting, and new Lyris threads could be kept from starting. IIS might start reporting CGI and permission errors.
Since then, we've found that various other programs in NT do leak memory, and that the only technique we've found to perceive this is to look at the "committed bytes". The leaked memory builds over
time, until NT runs out of memory, and the system stops running reliably.
Our recommendation, then, for anyone who experiences trouble with NT stability over time, is to do this:
* run "perfmon.exe"
* create a chart graphing "memory/committed bytes"
* slow the charting rate down to 5000 seconds, so that trends are visible
* Keep perfmon running in a corner of the window.
If your "committed bytes" rises over time (over several days) and don't go down, speed the charting rate back up to every second, then shut down, one by one, each process or service running on your
system. When the program causing the problem is terminated, you should see a big drop on your "committed bytes".
If you cannot make the "committed bytes" go down, your problem might be with IIS, or some other OS-integrated service which won't free its memory, even when stopped. Try upgrading the service,
applying a service pack, or changing things around (try a different web server for a few days, for example). When the "committed bytes" stops going up over time, it's likely that you'll have fixed
In a chart of perfmon.exe running on clio.lyris.net (our test server), with 3 days worth of data on-screen, we charted both the "committed bytes" of the system and "private bytes" of Lyris.exe. In
this chart Lyris averages about 6mb of RAM, occasionally spiking to 11mb of RAM during loaded times. The "committed bytes", the total amount of memory used on the system, stays around 40mb.