Last week we discussed the Tech support tools (/blogs/2012/02/03/netscaler-diagnostics-tech-support-tools/) which help you whenever you report any issue around NetScaler. Continuing on similar lines let us understand the memory usage on NetScaler and how can you use the Diagnostic page to troubleshoot memory related issues.

Memory is expensive in any context 🙂 and so is for NetScaler. It becomes even important when we start using more and more memory bound features where you need to buffer data on NetScaler or do special attack detection etc. It is also critical for the connection layer scalability on the system thus every time we see high memory utilization on NetScaler, we are concerned. Truth is that every high memory utilization symptom is not an issue (some may be) and probably an indication of system running out of resources. In other words you need to upgrade the platform to get better resources 🙂

To understand the artifacts well we have the “Memory Usage” tool in “Troubleshooting Data” section on Diagnostic page. This tool can get you lot of details on overall memory usage footprint in NetScaler. Let us get into every part of it and understand it better:

It begins with:

TotalMEM:  (7098012052/13344178176)     Allocated:  1958819924(14.68%)   ActualInUse: 1262898384(9.46%)    Free:  11385358252  

This provides you the summary of how much memory the appliance has, what is allocated, current in use and free memory available in system. In nCore model we do not allocate entire memory in first shot, the packet engines boot with minimum memory required to operate and then keeps allocating more memory based on the need. Next it gets into every individual memory pool and show the allocation:

MEMPOOL      MaxAllowd    CurAlloc          ErrLmtFailed  ErrAllocFailed  ErrFreeFailed

                                                 Bytes (Own%)(Overall%)  

———————————————————————————————————————-

MEM_PE              146800640       3775030(2.57%  0.03%)          0                0            0

MEM_LB_SERVER     30064771065      16757580(0.06%  0.13%)          0                0            0

MEM_LB_SESSION      954204160       3670464(0.38%  0.03%)          0                0            0

MEM_LB_SERVICE    30064771065          1792(0.00%  0.00%)          0                0            0

MEM_CSWMEM          176160768         20160(0.01%  0.00%)          0                0            0

MEM_IOH              36700160             0(0.00%  0.00%)          0                0            0

MEM_LOGGING       30064771065      16777215(0.06%  0.13%)          0                0            0

MEM_CONN          30064771065     524302848(1.74%  3.93%)          0                0            0

MEM_SNMP          30064771065        118336(0.00%  0.00%)          0                0            0

MEM_DEBUG             1835008          9632(0.52%  0.00%)          0                0            0

MEM_MISC          30064771065      15598450(0.05%  0.12%)          0                0            0

……

This is interesting as it shows the memory allocation details per internal pool and there is a long list of pools. Important thing to note is that every pool has a cap on “MaxAllowd” thus there are rare chances of one pool affecting the overall system memory usage. The “CurAlloc” additionally shows the percentage of memory used from the own pool versus overall system. From debugging and troubleshooting perspective what you are most interested in is “ErrLmtFailed, ErrAllocFailed and ErrFreeFailed”. “ErrLmtFailed” tell you the number of times you have gone over the particular pool limit. “ErrAllocFailed” shows number of times memory allocation failed for this pool. “ErrFreeFailed” shows number of times memory free operation failed for this pool. These 3 info points can tell you if there is some kind of memory crunch or issue in the system. If you are running a configuration command which returns a memory failure then you need to check out these counters for the specific pool.

Next you see the big picture of shared memory pool and details on AllocFailed and FreeFailed events:

SHARED MEMORY POOL

MaxAllowd                 CurAllocd         ErrAllocFailed                    ErrFreeFailed

—————————————————————————————————

381681664                    108294128                 0                              0

Following are the main buffer structures which play critical role in connection and session establishment.

CONN_POOL_MEMBERS:

Name             CurAllocd         CurFree               PgAllocd           PgAllocFailed 

———————————————————————————-

NSB          194666                    147036                   141 (2.2%)                           0

PCB           83384      83196                     28 (0.4%)                            0

NATPCB                  57344      57344                      7 (0.1%)                              0

B64           229376                    229015                     7 (0.1%)                              0

B128        114688                    114681                     7 (0.1%)                              0

B256         32768      32768                      4 (0.1%)                              0

SPCB             0                0            0 (0.0%)                              0

These are core data structures which are important for different type of connections and sessions. The “CurAllocd” can keep going up based on the runtime requirement until it hits the max. “CurFree” shows the number of these buffers sitting in free pool which will be used as you get more connections. “CurAllocd” will be raised further only when “CurFree” goes down substantially and you need more of these for ongoing traffic. “PgAllocd” shows the number of memory pages allocated and the system allocates more pages as “CurFree” goes down substantially.  “PgAllocFailed” shows the count of memory page allocation failure and this directly impacts the current and new transactions. For example if you hit allocation failures for NSB and there is no free structure then NetScaler will not be able to pick up new connections or packets. Similarly if you see the failure for SPCB block, it will impact the SSL transactions and failures will be noticed.

Beyond this you see several other specific pool related detailed information. All this is structured to provide single point to run through entire system level memory usage details. The coolest part is that you get all this information without getting to the NetScaler CLI or Shell… GUI rocks 😉