Successful systems administration relies on clearly and thoroughly understanding how a system uses its resources—its current hardware and software components—and understanding their interaction with, and dependencies upon, user workflows and running services. This chapter discusses the tools and knowledge required to effectively evaluate several key aspects of an existing infrastructure, including current utilization of bandwidth, services, hardware, and storage. This evaluation enables you to plan and implement changes to an existing infrastructure that minimize interruption to the system’s operation and maximize the user’s productivity.

Determining Current Utilization

It’s important to monitor utilization after a new system has been set up, or if an existing system has been serving users for a while. This information is critical to know for new systems to ensure that they run with the best performance. For existing systems, changes in usage (such as a group of video users that have changed to high-definition files) should cause you to reevaluate the setup to make sure that the system is still adequately meeting needs.

Chapter 1, “Planning Systems,” defines utilization as the ratio of usage to capacity. In that chapter, you planned for future capacity; here, your concern is watching a current, running system. Fortunately, Mac OS X includes many utilities that can help you determine system utilization.

Computing Network Bandwidth Utilization

How do you determine utilization—ratio of usage to capacity—for network bandwidth? Chapter 1, “Planning Systems,” explains the concept, but you need a way to obtain the values used in the computation. No single command neatly lays this out, but a series of commands enables you to assemble all the information.

The first piece of information comes from the netstat command, which displays network status information, including total bytes received and transmitted by a particular interface. To interrogate a network interface (represented here as en0), use the following command:

# netstat -I en0 -b
Name Mtu Network Address Ipkts Ierrs Ibytes Opkts Oerrs Obytes Coll
en0 1500 00:1f:5b:e9:87:1e 2852330 0 2908877372 1726539 0 606872778 0

The -I switch specifies the interface, and the -b switch asks netstat to display bytes in and bytes out of that interface. These values are taken from the time that the system boots.

You can also figure out how long the system has been running since boot time, with the uptime command:

$ uptime
8:16 up 16:41, 10 users, load averages: 0.08 0.15 0.20

That’s great output for a person, but not great for a computer. A running variable stores boot time in seconds since the UNIX epoch (the UNIX epoch is the time 00:00:00 UTC on January 1, 1970), accessible by sysctl:

$ sysctl kern.boottime
kern.boottime: { sec = 1177954679, usec = 0 } Mon Apr 30 10:37:59 2007

sysctl enables you to get or set kernel variables. To obtain a full list of variables, use the -A switch.

You can also retrieve the current date in terms of seconds with the date command:

$ date +%s

That gives you enough information to compute the average utilization of a given interface. Because you’ll want to assess this value from time to time, you can automate this entire routine.

This script is pretty straightforward math, with basic definitions of bits, bytes, and megabytes (automation and scripting will be introduced in Chapter 9, “Automating Systems”). The script uses line numbers for easier reference:

01: #!/usr/bin/env bash
03: # Defs
04: iface_name="en0"
05: iface_Mbps=1000
07: # Get boot time, clean up output to something useful
08: boottime=sysctl kern.boottime | sed 's/,//g' | awk '{print $5}'
10: # Determine interface activity
11: in_bytes=netstat -I $iface_name -b | tail -1 | awk '{print $7}'
12: out_bytes=netstat -I $iface_name -b | tail -1 | awk '{print $10}'
13: in_bits=$(($in_bytes * 8))
14: out_bits=$(($out_bytes * 8))
15: in_mbits=$(($in_bytes / 1000))
16: out_mbits=$(($out_bytes / 1000))
18: # Get the current time
19: currenttime=date +%s
21: # Determine total uptime
22: upt=$(($currenttime - $boottime))
24: # Gather bandwith stats in bps
25: in_band_bps=$(($in_bits / $upt))
26: out_band_bps=$(($out_bits / $upt))
27: in_band_mbps=$(echo "scale=5; $in_band_bps / 1000000" | bc)
28: out_band_mbps=$(echo "scale=5; $out_band_bps / 1000000" | bc)
30: iface_in_util=$(echo "scale=5; $in_band_mbps / $iface_Mbps" | bc)
31: iface_out_util=$(echo "scale=5; $out_band_mbps / $iface_Mbps" | bc)
33: printf "$iface_name averge inbound bits/s: $in_band_bps\n"
34: printf "$iface_name averge outbound bits/s: $out_band_bps\n"
35: printf "$iface_name averge inbound mbits/s: $in_band_mbps\n"
36: printf "$iface_name averge outbound mbits/s: $out_band_mbps\n"
37: printf "$iface_name average inbound utilization: $iface_in_util\n"
38: printf "$iface_name average outbound utilization: $iface_out_util\n"

The definitions on lines 4 and 5 are hard-coded into this script; update as necessary. Line 8 performs the same sysctl call presented previously, but then cleans up the output to retrieve only the boot time timestamp. Similarly, lines 11 and 12 reduce the output of netstat to only the “bytes in” and “bytes out” of an interface. Also of note in the script is the use of bc to perform floating-point calculations, which the Bash shell cannot do alone (lines 27 through 31). The math here is rudimentary:

  • Read an interface’s activity in bytes (lines 11 and 12).
  • Convert the results to bits by multiplying by 8—8 bits to a byte, remember? (lines 13 and 14).
  • Convert bytes to megabits (Mbit)—unused in this script, but a good exercise (lines 15 and 16).
  • Gather total seconds of uptime by subtracting boot time in seconds since the UNIX epoch from current date in seconds from the UNIX epoch (line 22).
  • Compute average bandwidth in bits per second by dividing total bits on an interface by seconds of uptime (line 25 and 26).
  • Convert bit/s to Mbit/s by dividing by 1,000,000 (10^6) (lines 27 and 28).
  • Compute utilization by dividing used bandwidth per second by the interface’s capacity.

For this script to work properly, you need to set the appropriate definitions (interface name and capacity) at the top of the script. Chapter 9, “Automating Systems,” shows ways to refine this script.

To single out current utilization statistics—network throughput and currently connected users—for the Apple Filing Protocol (AFP) and Server Message Block (SMB) file-sharing services on Mac OS X Server, you can use the serveradmin command with the fullstatus verb. Each service displays its current throughput in bytes per second. For example, to display the statistics for AFP, use the following command with root-level access:

# serveradmin fullstatus afp
afp:setStateVersion = 2
afp:servicePortsAreRestricted = "NO"
afp:logging = "NO"
afp:currentConnections = 8
afp:state = "RUNNING"
afp:startedTime = ""
afp:logPaths:accessLog = "/Library/Logs/AppleFileService/AppleFileServiceAccess.log"
afp:logPaths:errorLog = "/Library/Logs/AppleFileService/AppleFileServiceError.log"
afp:readWriteSettingsVersion = 1
afp:failoverState = "NIFailoverNotConfigured"
afp:guestAccess = "YES"
afp:servicePortsRestrictionInfo = _empty_array
afp:currentThroughput = 87

The afp:currentThroughput key contains the value of current AFP throughput. To single out throughput, pass the output through the grep command. For example, to single out the current throughput for the SMB service, use the following command:

# serveradmin fullstatus smb | grep Throughput
smb:currentThroughput = 39

The current throughput for smb is also given in bytes per second.

To list currently connected users and information on each user, serveradmin allows commands to be specified. The command for AFP and SMB is getConnectedUsers. For example, on a server with one user connected via SMB, the command and output would look like this:

# serveradmin command smb:command = getConnectedUsers
smb:state = "RUNNING"
smb:usersArray:_array_index:0:loginElapsedTime = -27950
smb:usersArray:_array_index:0:service = "alicew"
smb:usersArray:_array_index:0:connectAt = "Mon May 19 16:45:58 2008"
smb:usersArray:_array_index:0:name = "alicew"
smb:usersArray:_array_index:0:ipAddress = ""
smb:usersArray:_array_index:0:sessionID = 11148

To gather information on currently connected AFP users, use the corresponding afp command: afp:command = getConnectedUsers.

Determining Services and Hardware Utilization

It’s important for an administrator to understand the resources that individual programs consume on a given piece of hardware, as the two are intrinsically linked. Running services use hardware resources. Is the use of resources effective? Overwhelming? Can certain services be paired with other services? Service utilization refers to the impact of a single service, and hardware utilization refers to considering the hardware as a whole (for example, looking at memory utilization).

Each running process demands CPU time. Mac OS X contains several tools to monitor CPU load and each running process.

The Server Admin framework is specific to Mac OS X Server and can report on unique information. The GUI-based Server can display graphs of CPU utilization over an adjustable range of time. To view these graphs, launch Server, authenticate when prompted, select the server in question, and choose the Graphs button in the toolbar

The default view displays CPU usage for the past hour. You can expand the time range to the past seven days using the pop-up menu in the bottom right corner of the window.

Server Admin can also list services being provided by a server. Currently running services are indicated by a green ball next to their name in the servers list at the left of the Server Admin window. Additionally, services configured and running appear on the Overview page of Server Admin, along with high-level graphs of system utilization for CPU percentage, network bandwidth, and disk storage.

The information provided by the Server Admin framework is valuable, but may not tell a full story. It does not report service status for installed third-party software. Also, the Server Admin tools are specific to Mac OS X Server; there needs to be a way to assess workstation usage as well.

The most straightforward tool is ps, or process status. Typically, executing ps on its own, with no switches, is of little value. By itself, ps simply shows running processes that are owned by the calling ID and attached to a terminal. Of more interest is a list of all processes, owned by any user, with or without a controlling terminal. You can easily achieve such a list with the following command, run with an admin-level account:

# ps ax
1 ?? Ss 0:13.40 /sbin/launchd
10 ?? Ss 0:00.64 /usr/libexec/kextd
11 ?? Ss 0:09.48 /usr/sbin/notifyd
... (output removed for space considerations)
25539 ?? Ss 0:00.09 /usr/sbin/racoon -x
25729 ?? Ss 0:00.09 /usr/sbin/cupsd -l

The a switch, when combined with the x switch, causes ps to display all processes, from any user, with or without a controlling terminal. However, this does not tell the entire story. Each process in that ps list uses resources—but how much?

You can determine CPU percentage, load, and idle percentage with the top command, which is covered extensively in “top, CPU%, and Load Averages” in Chapter 8, “Monitoring Systems.” You can also find the load average statistic in other places. The uptime command displays load average along with the machine uptime:

$ uptime
8:06 up 2 days, 16:30, 10 users, load averages: 0.55 0.83 0.53

Additionally, you can fetch the load average directly from a sysctl variable, vm.loadavg:

$ sysctl vm.loadavg
vm.loadavg: { 0.54 0.68 0.51 }

You can also find CPU percentage and load averages with the iostat command, covered in the next section, “Determining Storage Utilization.”

Each process places load on the CPU by asking it to do work, in the form of making system call requests and placing an instruction to execute in the run queue. To determine which process currently is making the most system call requests, DTrace and Instruments utilities also are very helpful. Both utilities are covered in “Instruments and DTrace” in Chapter 8, “Monitoring Systems.”

You can also find virtual memory statistics with the top command, and view them in more detail using vm_stat. Most of the vm_stat columns are the same columns that you can view with the top command: free, active, inac (inactive), wire (wired), pageins, and pageout. If you do not specify an interval, vm_stat prints only a total and exits. If you add a numeric value after vm_stat and run it, it prints statistics repeatedly at the interval specified in seconds (to stop the listing, press Control-C):

$ vm_stat 1
Mach Virtual Memory Statistics: (page size of 4096 bytes, cache hits 27%)
free active inac wire faults copy zerofill reactive pageins pageout
174238 408613 301961 162294 193952562 6445537 116503302 44713 309110 60934
174320 408603 301961 162294 186 1 57 0 0 0
174384 408615 301961 162294 184 3 66 0 0 0
174450 408619 301961 162294 977 114 158 0 0 0
174350 408628 301961 162294 1016 0 520 0 0 0
174387 408626 301961 162294 154 0 33 0 0 0

Unlike the earlier exercise of writing a script to determine network bandwidth, you can use vm_stat to report on total statistics gathered since bootup. If you run vm_stat with a repeat interval, you should not be surprised by the first set of statistics printed under each banner: a lifetime-accumulated total (since bootup).

The columns have the following significance:

  • faults—Number of times the memory manager faults a page.
  • copy—Pages copied due to copy-on-write (COW). COW is a memory-management technique that initially allows multiple applications to point to the same page in memory as long as it is read-only. However, if any of those applications needs to write to that memory location, it cannot without changing COW for every other application pointing to that location. If an application tries to write to a shared memory location, it instead gets a copy; the original is left intact. The pages copied statistic shows how many times an application tries to write to a shared memory location. It’s an interesting statistic for administrators in some ways, but they can do little about it, short of choosing not to run certain applications that cause the behavior.
  • zerofill—Number of times a page has been zero-fill faulted on demand: A previously unused page marked “zero fill on demand” was touched for the first time. Again, there’s not much an administrator can do about this particular value.
  • reactive—Not what it sounds like; the number of times a page has been reactivated (or, moved from the inactive list to the active list).

See the vm_stat man page for further information.

Determining Storage Utilization

In addition to memory using resources, bandwidth and capacity can also use up storage resources. Systems administrators have several tools they can use to determine input/output activity, disk capacity use, and disk usage for a given part of the disk hierarchy, as well as to pinpoint details about file and disk activity. These tools include iostat, df, system_profiler, du, and Instruments and dtrace, respectively.

iostat displays I/O statistics for terminals and storage devices (disks). Similar to vm_stat, iostat can report on total statistics since bootup, or at a given interval. Running iostat solely with an interval is useful for displaying disk transactions, CPU statistics, and load average; to stop the listing, press Control-C. The -w switch specifies the wait interval between refreshing statistics:

$ iostat -w 2
disk0 disk1 cpu load average
KB/t tps MB/s KB/t tps MB/s us sy id 1m 5m 15m
21.51 19 0.40 19.48 13 0.25 8 5 87 0.24 0.25 0.24
4.00 0 0.00 4.00 0 0.00 3 4 94 0.22 0.25 0.24
4.00 1 0.00 4.00 0 0.00 3 4 94 0.20 0.24 0.24
12.00 0 0.01 12.00 0 0.01 2 4 94 0.20 0.24 0.24
4.00 0 0.00 4.00 0 0.00 3 4 93 0.19 0.24 0.24
12.51 36 0.45 11.50 6 0.07 2 5 93 0.19 0.24 0.24

Often, the reason to use iostat is to focus solely on the disk statistics. To drop the CPU and load information—the same information available from the top utility—use the -d switch. To further focus on a specific disk or disks, you can add the device node name or names to the command:

$ iostat -dw 2 disk0 disk1
disk0 disk1
KB/t tps MB/s KB/t tps MB/s
21.51 19 0.40 19.48 13 0.25
0.00 0 0.00 0.00 0 0.00
4.00 1 0.00 4.00 0 0.00
11.30 15 0.17 22.50 1 0.02
6.42 9 0.06 4.71 3 0.02

iostat can also display output in two alternate formats that can complete the I/O story. The -o switch causes iostat to display sectors per second, transfers per second, and milliseconds per seek:

$ iostat -od disk0
sps tps msps
794 18 0.0

The -I switch displays total statistics over the time of running iostat, rather than average statistics for each second during that time period:

$ iostat -Id disk0
KB/t xfrs MB
21.51 6736974 141497.11

You can also quickly summarize disk capacity with the df (“disk free”) command. Simply type df at a command prompt to display useful information about all mounted volumes:

$ df
Filesystem 512-blocks Used Avail Capacity Mounted on
/dev/disk4 489955072 118939584 370503488 24% /
devfs 233 233 0 100% /dev
fdesc 2 2 0 100% /dev
1024 1024 0 100% /.vol
automount -nsl [212] 0 0 0 100% /Network
automount -fstab [218] 0 0 0 100% /automount/Servers
automount -static [218] 0 0 0 100% /automount/static
/dev/disk10 1953584128 1936325520 17258608 99% /Volumes/Data0
/dev/disk5 361619840 323948976 37670864 90% /Volumes/Data1

This output displays capacities in 512-byte blocks, and lists a percentage-full statistic. You can use two switches to refine this output to make it easier to read:

$ df -T hfs -h
Filesystem Size Used Avail Capacity Mounted on
/dev/disk4 234G 57G 177G 24% /
/dev/disk10 932G 923G 8.2G 99% /Volumes/Data0
/dev/disk5 172G 154G 18G 90% /Volumes/Data1

The -T switch limits the display to file systems of a certain type, in this case, hierarchical file system (HFS) (which also implies HFS Plus, the default file system for Mac OS X Leopard). The -h switch causes df to display capacities in “human-readable” format (output uses byte, kilobyte, megabyte, gigabyte, terabyte, and petabyte suffixes, as necessary, rather than blocks).

system_profiler is a versatile Mac OS X–specific utility. It excels at querying Macintosh hardware. Along with the other command-line utilities presented here, system_profiler can also report on the total capacity and available space on a storage device. For example, to display detailed information on all Serial Advanced Technology Attachment (ATA)–connected disks, use the SPSerialATADataType command:

$ system_profiler SPSerialATADataType

Intel ICH8-M AHCI:

Vendor: Intel
Product: ICH8-M AHCI
Speed: 1.5 Gigabit
Description: AHCI Version 1.10 Supported

Hitachi HTS542525K9SA00:

Capacity: 232.89 GB
...output removed for space considerations...
Capacity: 199.88 GB
Available: 124.72 GB
Writable: Yes
File System: Journaled HFS+
BSD Name: disk0s2
Mount Point: /
Capacity: 199.88 GB
Available: 124.72 GB
Writable: Yes
File System: Journaled HFS+

While df is perfect for quickly determining the overall use of a mounted storage device, you often need more detail. The du (“disk usage”) command answers the questions “Where is storage being allocated on a given file system?” and “Where is all the space going?”

Running du with no options will, for the current directory, list each file and directory along with the number of blocks occupied by the given object:

# du
0 ./.TemporaryItems/folders.1026
0 ./.TemporaryItems/folders.1027
0 ./.TemporaryItems/folders.1029
(output removed for space considerations)
0 ./xavier/Public
24 ./xavier/Sites/images
0 ./xavier/Sites/Streaming
40 ./xavier/Sites
1411736 ./xavier/untitled folder
10270512 ./xavier
255297560 .

The final entry, the dot, represents the total for the current directory. As with df, you can use several switches to tailor the output for easier reading:

# du -h -d 1 -c /Users
0B ./.TemporaryItems
1.5M ./andy
333M ./arthur
202M ./ashley
(output removed for space considerations)
6.7G ./mike
1.5M ./paul
15G ./tiffany
1.6M ./thomas
3.8M ./william
4.9G ./xavier
122G .
122G total

The -h switch generates “human-readable” output, as seen in the df command described previously in this section. The -d switch causes du to output entries only at the given depth, with the current directory being 0, immediate subdirectories being 1, and so on. Use the -c switch to print a final, grand-total line. Also, instead of simply summing up the current directory, you can name the directory path, which in this case is named /Users.

Finally, is an ideal way to examine file activity and impact on a storage system for one or more processes. If df and du do not provide the information that you need, Instruments, with its capability to finely detail file and disk activity, and dtrace offer the necessary power and depth to provide that information. For more information on Instruments and dtrace, see the section “Instruments and DTrace” in Chapter 8, “Monitoring Systems.”

This chapter is from the book
Apple Training Series: Mac OS X Advanced System Administration v10.5