![]() | ![]() |
The last three columns show where the CPU cycles are expended. If the server is CPU bound, the idle time decreases to zero. When nfsd threads are waiting for disk operations to complete, and there is no other system activity, the CPU is idle, not accumulating cycles in system mode. The system column shows the amount of time spent executing system code, exclusive of time waiting for disks or other devices. If the NFS server has very little (less than 10%) CPU idle time, consider adding CPUs, upgrading to a faster server, or moving some CPU-bound processes off of the NFS server. The "pureness" of NFS service provided by a machine and the type of other work done by the CPU determines how much of an impact CPU loading has on its NFS response time. A machine used for print spooling, hardwire terminal server, or modem line connections, for example, is forced to handle large numbers of high-priority interrupts from the serial line controllers. If there is a sufficient level of high-priority activity, the server may miss incoming network traffic. Use iostat, vmstat, or similar tools to watch for large numbers of interrupts. Every interrupt requires CPU time to service it, and takes away from the CPU availability for NFS. If an NFS server must be used as a home for terminals, consider using a networked terminal server instead of hardwired terminals.[46] The largest advantage of terminal servers is that they can accept terminal output in large buffers. Instead of writing a screenful of output a character at a time over a serial line, a host writing to a terminal on a terminal server sends it one or two packets containing all of the output. Streamlining the terminal and NFS input and output sources places an additional load on the server's network interface and on the network itself. These factors must be considered when planning or expanding the base of terminal service.% vmstat 10 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr dd f0 s0 -- in sy cs us sy id ...Ignore first line of output 0 0 34 667928 295816 0 0 0 0 0 0 0 1 0 0 0 174 126 73 0 1 99
[46]A terminal server has RS-232 ports for terminal connections and runs a simple ROM monitor that connects terminal ports to servers over telnet sessions. Terminal servers vary significantly: some use RS-232 DB-25 connectors, while others have RJ-11 phone jacks with a variable number of ports.Along these lines, NFS servers do not necessarily make the best gateway hosts. Each fraction of its network bandwidth that is devoted to forwarding packets or converting protocols is taken away from NFS service. If an NFS server is used as a router between two or more networks, it is possible that the non-NFS traffic occludes the NFS packets. The actual performance effects, if any, will be determined by the bandwidth of the server's network interfaces and other CPU loading factors.
This example starts 16 kernel nfsd threads. In Solaris, the nfsd daemon creates multiple kernel threads that perform the actual filesystem operations. It exists as a user-level process in order to establish new connections to clients, allowing a server to accept more NFS requests while other nfsd threads are waiting for a disk operation to complete. Increasing the number of server-side threads improves NFS performance by allowing the server to grab incoming requests more quickly. Increasing nfsd threads without bound can adversely affect other system resources by dedicating excessive compute resources to NFS, making the optimal choice an exercise in observation and tuning./usr/lib/nfs/nfsd -a 16
The -a directive indicates that the daemon should listen on all available transports. In this example the daemon allows a maximum of 16 NFS requests to be serviced concurrently. The nfsd threads are created on demand, so you are only setting a high water mark, not the actual number of threads. If you configure too many threads, the unused threads will not be created. You can throttle NFS server usage by limiting the maximum number of nfsd threads, allowing the NFS server to concentrate on performing other tasks. It is hard to come up with a magic formula to compute the ideal number of nfsd threads, since hardware and NFS implementations vary considerably between vendors. For example, at the time of this writing, Sun servers are recommended[47] to use the maximum of:# /usr/lib/nfs/nfsd -a 16
[47]Refer to the Solaris 8 NFS Server Performance and Tuning Guide for Sun Hardware (February 2000).
[48]In Solaris, SunOS 4.x, and SVR4, the buffer cache stores only UFS metadata. This in contrast to the "traditional" buffer cache used by other Unix systems, where file data is also stored in the buffer cache. The Solaris buffer cache consists of disk blocks full of inodes, indirect blocks, and cylinder group information only.In Solaris, you can view the buffer cache statistics by using sar -b. This will show you the number of data transfers per second between system buffers and disk (bread/s & bwrite/s), the number of accesses to the system buffers (logical reads and writes identified by lread/s & lwrit/s), the cache hit ratios (%rcache & %wcache), and the number of physical reads and writes using the raw device mechanism (pread/s & pwrit/s):
In practice, a cache hit ratio of 100% is hard to achieve due to lack of access locality by the NFS clients, consequently a cache hit ratio of around 90% is considered acceptable. By default, Solaris grows the dynamically sized buffer cache, as needed, until it reaches a high watermark specified by the bufhwm kernel parameter. By default, Solaris limits this value to 2% of physical memory in the system. In most cases, this 2%[49] ceiling is more than enough since the buffer cache is only used to cache inode and metadata information. You can use the sysdef command to view its value:# sar -b 20 5 SunOS bunker 5.8 Generic sun4u 12/06/2000 10:39:01 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s 10:39:22 19 252 93 34 103 67 0 0 10:39:43 21 612 97 46 314 85 0 0 10:40:03 20 430 95 35 219 84 0 0 10:40:24 35 737 95 49 323 85 0 0 10:40:45 21 701 97 60 389 85 0 0 Average 23 546 96 45 270 83 0 0
[49]2% of total memory can be too much buffer cache for some systems, such as the Sun Sparc Center 2000 with very large memory configurations. You may need to reduce the size of the buffer cache to avoid starving the kernel of memory resources, since the kernel address space is limited on Super Sparc-based systems. The newer Ultra Sparc-based systems do not suffer from this limitation.
If you need to modify the default value of bufhwm, set its new value in /etc/system, or use adb as described in Chapter 15, "Debugging Network Problems". The actual file contents are cached in the page cache, and by default the filesystem will cache as many pages as possible. There is no high watermark, potentially causing the page cache to grow and consume all available memory. This means that all process memory that has not been used recently by local applications may be reclaimed for use by the filesystem page cache, possibly causing local processes to page excessively. If the server is used for non-NFS purposes, enable priority paging to ensure that it has enough memory to run all of its processes without paging. Priority paging prevents the filesystem from consuming excessive memory by limiting the file cache so that filesystem I/O does not cause unnecessary paging of applications. The filesystem can still grow to use free memory, but cannot take memory from other applications on the system. Enable priority paging by adding the following line to /etc/system and reboot:# sysdef ... * * Tunable Parameters * 41385984 maximum memory allowed in buffer cache (bufhwm) ...
Priority paging can also be enabled on a live system. Refer to the excellent Solaris Internals book written by Mauro and McDougall and published by Sun Microsystems Press for an in-depth explanation of Priority Paging and File System Caching in Solaris. The following procedure for enabling priority paging on a live 64-bit system originally appeared on their book:* * Enable Priority Paging * set priority_paging=1
Setting priority_ paging=1 in /etc/system causes a new memory tunable, cachefree, to be set to twice the old paging high watermark, lotsfree, when the system boots. The previous adb procedure does the equivalent work on a live system. cachefree scales proportionally to other memory parameters used by the Solaris Virtual Memory System. Again, refer to the Solaris Internals book for an in-depth explanation. The same adb procedure can be performed on a 32-bit system by replacing the /E directives with /D to print the value of a 32-bit quantity and /Z with /W to set the value of the 32-bit quantity.# adb -kw /dev/ksyms /dev/mem physmem 3ac8 lotsfree/E lotsfree: lotsfree: 234 /* value of lotsfree is printed */ cachefree/Z 0t468 /* set to twice the value of lotsfree */ cachefree: ea = 1d4 dyncachefree/Z 0t468 /* set to twice the value of lotsfree */ dyncachefree: ea = 1d4 cachefree/E cachefree: cachefree: 468 dyncachefree/E dyncachefree: dyncachefree: 468
[50]RAID stands for Redundant Array of Inexpensive Disks. Researchers at Berkeley defined different types of RAID configurations, where lots of small disks are used in place of a very large disk. The various configurations provide the means of combining disks to distribute data among many disks (striping), provide higher data availability (mirroring), and provide partial data loss recovery (with parity computation).
If the disk queues are grossly uneven, consider shuffling data on the filesystems to spread the load across more disks. Most medium to large servers take advantage of their disk storage array volume managers to provide some flavor of RAID to stripe data among multiple disks. If all of your disks are more than 75-80% utilized, you are disk bound and either need faster disks, more disks, or an environment that makes fewer disk requests. Tuning kernel and client configurations usually helps to reduce the number of disk requests made by NFS clients.% iostat -D 5 md10 md11 md12 md13 rps wps util rps wps util rps wps util rps wps util 17 45 33.7 5 4 10.5 3 3 7.5 5 5 11.6 1 5 6.1 17 20 43.7 1 1 2.0 1 0 1.1 2 7 10.4 14 22 42.0 0 0 0.7 0 1 2.3
If you are hitting the cache less than 90% of the time, increase ncsize on the NFS server. The ncsize kernel tunable specifies the number of entries cached by the DNLC. In Solaris, every file currently opened holds an inode cache entry active, making the inode readily available without the need to access the disk. To improve performance, inodes for files recently opened are kept in this cache, anticipating that they may be accessed again in the not too distant future. Furthermore, inodes of files recently closed are maintained in an inactive inode cache, in anticipation that the same files may be reopened again soon. Since NFS does not define an open operation, NFS clients accessing files on the server will not hold the file open during access, causing the inodes for these files to only be cached in the inactive inode cache. This caching greatly improves future accesses by NFS clients, allowing them to benefit from the cached inode information instead of having to go to disk to satisfy the operation. The size of the inactive inode table is determined by the ufs_ninode kernel tunable and is set to the value of ncsize during boot. If you update ncsize during runtime, make sure to also update the value of ufs_ninode accordingly. The default value for ncsize is (maxusers * 68) + 360. Maxusers can be defined as the number of simultaneous users, plus some margin for daemons, and be set to about one user per megabyte of RAM in the system, with a default limit of 4096 in Solaris.% vmstat -s ...Page and swap info... 621833654 total name lookups (cache hits 96%) ...CPU info...
[51]There are no adverse effects of using the background option, so you can use it for all your NFS-mounted filesystems.This deadlock problem goes away when your NFS clients use the automounter in place of hard-mounts. Most systems today heavily rely on the automounter to administer NFS mounts. Also note that the bg mount option is for use by the mount command only. It is not needed when the mounts are administered with the automounter.
Hosts on network 138.1.148.0 are able to "see" boris because boris forwards packets from any one of its network interfaces to the other. Hosts on the 138.1.148.0 network may mount filesystems from either hostname:138.1.148.1 boris-bb4 138.1.147.1 boris-bb3 138.1.146.1 boris-bb2 138.1.145.1 boris-bb1 boris
boris:/export/boris boris-bb4:/export/boris