The script is a part of and available in our existing benchmarking package (dbt2-0.37.50.10) developed by Mikael Ronstrom.
On Top:
If you ever did benchmarking on Linux or simply wondered "where did all my resources go", top is your best friend. Since this post is not about Linux, you can google "Linux top explained" for more details.On Performance counters:
To learn about Windows PerfCounters, please refer to my previous blog entries in this series. I will be addressingSystem.Diagnostics
class as just Diagnostics
.On Powershell:
For ages, Windows users were looking at bash wondering why they do not have anything similar to it. After much trial and error, Microsoft delivered Powershell. In my humble opinion, Powershell is simply great! As per difference between Powershell and bash I would mention just one; Powershell pipe passes objects while bash pipe passes plain text.On script itself:
Type "perfmon deleting files" in Google and you'll see why I made this script ;-) Joke aside, we have a mature testing/benchmarking framework written in bash and wanted the same look and feel on Windows. Top script is just the latest piece of that effort.I undertook this work since I firmly believe in native tools when dealing with performance issues. If I was just after measuring performance delatas between versions, some generic tool written for some other platform, such as in Perl or similar, would have been good enough. But, IMO, it would not have been fair to non-native OS.
Also, studying native tools is an integral part of studying the OS itself which is something you can not tackle performance issues without.
The script will evolve naturally to cover for the information we need in our everyday work.
Using Windows performance counters through PowerShell CIM classes it is possible to gather stats on computer performance. The script functions like this:
o Main code starts 2 background jobs; one for collecting details for header table ("Top_Header_Job") and one for processes table ("Top_Processes_Job").
o Each of the jobs then collects stats and writes them into a file ("header" job to TempDir\headerres.txt file, "tasks" job to TempDir\thrprocstats.txt) which is, in turn, read by script main code. Main code uses TempDir\topcfg.txt to pass info to tasks (currently, the field to sort the table by). All of the files are overwritten each time so there is not much data in them.
o After the files are read by main code, the data is displayed. To be able to properly position the output on screen, I used various
[console]
functions not available in PowerShell_ISE.Requirements:
The script can not be run in _ISE. Use Powershell console.The script requires PS 3+.
The script requires at least 50 x 80 console window.
The script might work with .NET FW older than 4 but it was tested only with .NET 4.5.x.
Getting started:
1) Put script somewhere.2) Start PowerShell (NOT PS_ISE)
3) cd to "somewhere" directory
4) .\top-script.ps1
a) Get-Help .\top-script.ps1
b) Get-Help .\top-script.ps1 -Examples
5) While script is running you can use (single key) shortcuts:
q - Quit
m - Sort by process WS (occupied RAM) DESC, CPUpt DESC
p - Sort by process CPU utilization DESC, WS(MB) DESC; Non-Normalized/Normalized.
Note: Script is started with CPU utilization by process as de-normalized (i.e. on multi-CPU boxes, this value can be well over 100%). To display normalized values (i.e. "non-normalized" value / # of CPUs), just press "p" again. IF non-normalized value is the source for data, the column title will be '_CPUpt__'. Normalized CPU utilization value (CPUpt / # of CPUs) will display as '_CPUpt_N'.
n - Sort by process Name ASC
r - Sort by process PID ASC, CPUpt DESC
+ - Display individual CPU's. Comma separated list of values (i.e. 0,1,2).
1 - Display individual Processors (Sockets). Comma separated list of values (i.e. 0,1,2).
Note: Script displays either Socket load or CPU load.
- - Cancel displaying individual CPUs/Sockets.
The output:
09:21:17, Uptime:00d:00h:36m, Users: 1, # thds ready but queued for CPU: 0 ------------------------------------------------------------------------------- | RUNNING | CPU | RAM[MB] | ------------------------------------------------------------------------------- | services: 104 | Sys: 50.78%(P 4.30%/U 46.48%)| Installed: 8192 | | processes: 126 | Idle:49.22% | HW reserv: 320.77 | | threads: 1483 | HWint: 7087/0.38% | Visible: 7871.23 | | handles: 32696 | SWint: 278/0.38% | Available: 5136 | | CoSw/s: 6351 | High-prio thd exec: 50.38% | Modified: 117.07 | | | Total # of cores: 4 | Standby: 3936.44 | | | | PagesIn/Read ps: 2 | ------------------------------------------------------------------------------- _PID_ PPID PrioB Name CPUpt Thds Hndl WS(MB) VM(MB) PM(MB) ----- ---- ----- ---- ----- ---- ---- ------ ------ ------ 3916 852 8 mcshield 10.14 53 490 46 225 100 1836 5452 8 powershell#2 1.09 15 377 67 616 46 7436 5452 8 powershell#1 0.72 17 556 85 624 67 3984 156 8 WmiPrvSE 0.72 9 303 14 56 9 7024 156 8 WmiPrvSE#2 0.72 7 201 10 52 7 5452 5148 8 powershell 0.36 18 454 90 628 79 2292 852 8 FireSvc 0.36 28 539 10 156 36 7864 5148 8 thunderbird 0.00 52 630 293 656 264 2880 5148 8 powershell_ise 0.00 12 427 181 869 164 5148 3860 8 explorer 0.00 25 810 81 267 53 7028 6648 8 googledrivesync#1 0.00 29 712 76 193 64 1124 852 8 svchost#5 0.00 45 1560 46 187 31 6508 5148 8 sidebar 0.00 20 433 39 195 20 816 796 13 csrss#1 0.00 10 765 35 126 3 6592 5148 8 iCloudServices 0.00 16 442 32 167 18 6868 6764 8 pcee4 0.00 7 202 32 612 32 1976 1052 13 dwm 0.00 5 135 31 140 26 3400 156 8 WmiPrvSE#1 0.00 12 297 28 91 22 1088 852 8 svchost#4 0.00 21 584 27 126 14 3648 852 8 dataserv 0.00 10 510 24 224 21 980 852 8 svchost#2 0.00 25 575 23 119 26 2404 852 8 PresentationFontCache 0.00 6 149 21 506 28 ...
HEADER data
Current time, uptime, # of active users, # of threads per CPU that are ready for execution but can't get CPU cycles (obviously, you want to keep this as low as possible (<= 2)).RUNNING section # of services in Started state # of user processes # of threads spawned # of handles open # of context switches per second CPU section % of CPU used (% used by privileged instr. / % used by user instr.) % of CPU consumed by Idle process. # of HW interrupts per sec./% of CPU used to service HW interrupts # of SW interrupts queued for servicing per sec./ % of CPU used to service SW interrupts % of CPU consumed by high-priority threads execution. # of phys. and virt. cores. Here, 4 is Dual-Core with HT enabled. RAM[MB] section Installed RAM. RAM reserved by Windows for HW. Amount of RAM user actually sees. Amount of available RAM for user processes. Amount of RAM marked as "Modified". Amount of RAM marked as "Standby" (cached). Ratio between Memory\Pages Input/sec and Memory\Page Reads/sec. Number of pages per disk read. Should keep below 5.
TABLE data
_PID_ Unique identified of the process. PPID Unique identifier of the process that started this one. PrioB Base priority. Name Name of the process. CPUpt % of CPU used by process. Thds # of threads spawned by the process. Hndl # of handles opened by the process. WS(MB) Total RAM used by the process. Working Set is, basically, the set of memory pages touched recently by the threads belonging to the process. VM(MB) Size of the virtual address space in use by the process. PM(MB) The current amount of VM that this process has reserved for use in the paging files.
Longer explanation of the values:
HEADER:
Foreword: Since Windows is not "process" based OS (like Linux) it is impossible to calculate the "System load". The next best thing is CPU queue length (see below).Uptime: Diagnostics.PerformanceCounter("System", "System Up Time") Users: WMI query using query.exe tool which should be a part of your Windows. query user /server:localhost Number of users currently logged in. If no query.exe, the value is -1. # thds ready but queued for CPU: Diagnostics.PerformanceCounter("System", "Processor Queue Length") How many threads are in the processor queue ready to be executed but not currently able to use cycles. Windows OS has single queue length counter thus the value displayed is counter value divided with number of CPU's. Link.
RUNNING section:
services: (Get-Service | Where-Object {$_.Status -ne 'Stopped'} | Measure-Object).Count Total # of services actually running. processes: Diagnostics.PerformanceCounter("System", "Processes") Total number of user processes running. Link. threads: Diagnostics.PerformanceCounter("System", "Threads") Total # of threads spawned. handles: Diagnostics.PerformanceCounter("Process", "Handle Count") Total # of open handles. CoSw/s: Diagnostics.PerformanceCounter("System", "Context Switches/sec") Context switching happens when a higher priority thread pre-empts a lower priority thread that is currently running or when a high priority thread blocks. High levels of context switching can occur when many threads share the same priority level. This often indicates that there are too many threads competing for the processors on the system. If you do not see much processor utilization and you see very low levels of context switching, it could indicate that threads are blocked. Link.
CPU section:
Foreword: Windows OS has special thread called "Idle" which consumes free CPU cycles thus these counters return values relating to this one. Also, Windows are not "process" based but rather "thread" based so all of these numbers are approximations. This is even more important in the TABLE which shows CPU utilization per process (see explanation there). Most of these counters are multi-instance so instance name is '_Total' (ie. CPU utilization in total as opposed to per NUMA node, Core, CPU...).
Sys: nn.nn%(P mm.mm%/U zz.zz%): Diagnostics.PerformanceCounter("Processor Information","% Processor Time"), Diagnostics.PerformanceCounter("Processor Information","% Privileged Time"), Diagnostics.PerformanceCounter("Processor Information","% User Time"). First number shows, effectively, % of cycles CPU(s) didn't spend running the Idle thread. Second number is the time CPU(s) spent on executing Privileged instructions while third is time CPU(s) spent executing user-mode instructions. For example, when your application calls operating system functions (say to perform file or network I/O or to allocate memory), these operating system functions are executed in Privileged mode. Link. Idle: Diagnostics.PerformanceCounter("Processor Information", "% Idle Time") Link. HWint: Diagnostics.PerformanceCounter("Processor Information","Interrupts/sec"), Diagnostics.PerformanceCounter("Processor Information","% Interrupt Time"). Rate of hardware interrupts per second and a percent of CPU time this takes. Link. SWint: Diagnostics.PerformanceCounter("Processor Information","DPCs Queued/sec"), Diagnostics.PerformanceCounter("Processor Information","% DPC Time"). Rate at which software interrupts are queued for execution and a % of CPU time this takes. Link. High-prio thd exec: Diagnostics.PerformanceCounter("Processor Information","% Priority Time"). CPU utilization by high priority threads. Link: Can't find any links in MSDN... Total # of cores: (Get-CimInstance Win32_ComputerSystem).NumberOfLogicalProcessors Number of physical and virtual cores present.
RAM[MB] section:
Installed: (GCIM -class "cim_physicalmemory" | Measure-Object Capacity -Sum).Sum/1024/1024 HW reserv: Installed - Visible ;-) Visible: (Get-CimInstance win32_operatingsystem).TotalVisibleMemorySize Available: Diagnostics.PerformanceCounter("Memory","Available MBytes") Modified: Diagnostics.PerformanceCounter("Memory","Modified Page List Bytes") Standby: Diagnostics.PerformanceCounter("Memory","Standby Cache Core Bytes") + Diagnostics.PerformanceCounter("Memory", "Standby Cache Normal Priority Bytes") + Diagnostics.PerformanceCounter("Memory","Standby Cache Reserve Bytes") Basically, cache memory. PagesIn/Read ps: Diagnostics.PerformanceCounter("Memory","Pages Input/sec")/ Diagnostics.PerformanceCounter("Memory","Page Reads/sec") Ratio between Memory\Pages Input/sec and Memory\Page Reads/sec which is number of pages per disk read. Should keep below 5.
TABLE:
Foreword: Windows is not "process" based OS (like Linux) but rather "thread" based so all of the numbers relating to CPU usage are approximations. I did made a "proper" CPU per Process looping and summing up Threads counter (https://msdn.microsoft.com/en-us/library/aa394279%28v=vs.85%29.aspx) based on PID but that proved too slow given I have ~1 sec to deal with everything. CPU utilization using RAW counters with 1s delay between samples proved to produce a bit more reliable result than just reading Formatted counters but, again, too slow for my 1s ticks (collect sample, wait 1s, collect sample, do the math takes longer than 1s). Thus I use PerfFormatted counters in version 0.9RC.Win32_PerfRawData_PerfProc_Process; Win32_PerfFormattedData_PerfProc_Process Link. _PID_ Unique identified of the process. PPID Unique identifier of the process that started this one. PrioB Base priority. Name Name of the process. CPUpt_(N) % of CPU used by process. On machines with multiple CPUs, this number can be over 100% unless you see _CPUpt_N caption which means "Normalized" (i.e. CPUutilization / # of CPUs). Toggle Normal/Normalized display by pressing the "p" key. Thds # of threads spawned by the process. Hndl # of handles opened by the process. WS(MB) Total RAM used by the process. Working Set is, basically, the set of memory pages touched recently by the threads belonging to the process. VM(MB) Size of the virtual address space in use by the process. PM(MB) The current amount of VM that this process has reserved for use in the paging files.Note that it is possible to display CPU/Socket data for chosen HW by pressing + or 1 keys, entering 0-based index and separating multiple values by ,:
User Priv Idle HWin SWIn User Priv Idle HWin SWIn ------------------------------------- ------------------------------------- %CPU 0: 47, 5, 47, 0, 0 %CPU 1: 0, 0, 100, 0, 0 %CPU 2: 35, 11, 52, 0, 0 %CPU 3: 5, 0, 94, 0, 0The input here was 0,1,2,3 thus displaying data about first 4 cores. The CPU/Socket data is displayed between the Header and the Table areas reducing the number of visible processes. To remove this information from screen, just press "-" key.
INNER WORKINGS:
In general, script output comprises of Header part and Table part showing details on processes. In-between the two, you can show Processor/Core info. There are two background jobs started to accomplish this; "Top_Header_Job" & "Top_Processes_Job". The data about individual processors/cores is calculated in main script body.Script starts with my usual checks, proceeds to variable declaration part where I initialize some of the performance counters (which takes time) and then starts Header and Processes jobs. The jobs itself follow the same logic. I.e. I first start perfcounter instances (which takes time) and then loop through values passing them back in file.
Main script body collects the data from files refreshing the display. Also, main script is in charge of displaying individual processor/core data as well as monitoring the keyboard input. This means CTRL+C will NOT work but you can still stop the script with CTRL+BREAK:
[console]::TreatControlCAsInput = $true
Regular way to exit is pressing the "q" key.
After you press the "q" key, cleanup code is executed, stopping the background jobs and removing temporary files used for communication. It's worth noting that cleanup code does not throw any errors. This is because nothing bad can happen. Files are less than 1kB in total while background jobs can be stopped either via trick described below or simply by exiting Powershell console.
Lets go deeper into the regions of code now. First region is Check which I described in October 2015 blog so no need to repeat myself. Next is Variable Declarations region where I gather one-time top-level data, mainly related to CPU topology using tricks described in Blog 3 and Blog 4 by manipulating Instances as described in Blog 1 of this series. Executing this part takes couple of seconds.
Next thing is to start the Header job. It takes argument (total number of cores) from the call and proceeds with initializing various counters. As with all initializations, this also takes couple of seconds. Main DO loop starts the timer to ensure samples are collected in 1 second intervals. Also, it checks if you have query.exe tool installed and determines the number of active users, if the tool exists, or displays -1 if it doesn't. There are other ways of determining number of logged users but they are all too slow for 1s tick. After forming the resulting lines, I use
[System.IO.StreamWriter]
to record them to Env:\TEMP headerres.txt file. The control is then returned to main script which waits for Env:\TEMP headerres.txt file (or 20s, whichever comes first).Next step is to start the Tasks job which will collect data about running processes. As opposed to Task manager, I show background processes (ie. services) too. Worth noting is that, due to timing issues, I use Process (Win32_PerfFormattedData_PerfProc_Process) and not Thread (win32_PerfFormattedData_PerfProc_Thread) counters.
Since Windows is *thread* based (meaning a Process is just a container for Threads doing the work) this actually means scarifying some of the accuracy (for example CPU utilization data) in favour of faster and smoother execution:
Note: Script-block $sb is used just for sorting the resultset depending on keyboard input.
Note: "Name=" is the same as writing "Label=". Both can be abbreviated so the expression becomes @{L=...";"E={...}}.
Note: Script-block $sb is used just for sorting the resultset depending on keyboard input.
There is one more way of doing this and that is by expanding Process perf object. I use this approach when checking for congestion on thread level (MSDN): Note: If you check the value of
$Processes
variable here, you will notice something like
Id : 1996
...
Threads : {2000, 2012, 2016, 2040...}
...
meaning Threads member is actually an object and can be expanded to show more data:
PS > $Processes.Threads
BasePriority : 8
CurrentPriority : 9
Id : 1972
IdealProcessor :
PriorityBoostEnabled :
PriorityLevel :
PrivilegedProcessorTime :
StartAddress : 2006300688
StartTime :
ThreadState : Wait
TotalProcessorTime :
UserProcessorTime :
WaitReason : UserRequest
ProcessorAffinity :
Site :
Container :
...
This leaves me with neat little CSV file which I then import to Excel and group by Process ID for further analysis.Back to main script, region Main-start, where I wait for Processes job to start producing data before proceeding. If there is no data generated, the script will stop the jobs and exit.
Next is the neat trick to reduce the flicker while clearing up the screen:
[System.Console]::Clear()
and positioning the cursor at top left corner:
$saveYH = [console]::CursorTop
$saveXH = [console]::CursorLeft
Worth noting here, in terms of reduced flicker, is hiding the cursor itself:
[console]::CursorVisible = $false
After that, you enter region Main-loop which is the main code for the script. If there is fresh header data to be displayed, I move cursor to (0,0) and write it out. Otherwise, I skip this and check if I should display Core/Socket data. The problem here is that user can specify any number of cores/sockets to display data for and I display two of them in each line. Thus I need an array where user input is mapped to absolute index of the requested piece of HW in perf counter. The array is created in key-press handler. For the sake of performance, both core and socket counters were initialized at the start of the script:
#Just the individual CPUs.
$CPUdata = Get-CimInstance Win32_PerfFormattedData_Counters_ProcessorInformation | Where {$_.Name -match "^(\d{1}),(\d{1})"}
#Just the individual Sockets.
$Socketdata = Get-CimInstance Win32_PerfFormattedData_Counters_ProcessorInformation | Where {$_.Name -match "^(\d{1}),_Total"}
Then, if there is fresh data provided by Top_Processes_Job, I display it.
Next comes the keyboard handling routine. First, check that there is something to handle:
if ($Host.UI.RawUI.KeyAvailable) {
If there is, put it into variable:
$k = $Host.UI.RawUI.ReadKey("AllowCtrlC,IncludeKeyDown,IncludeKeyUp,NoEcho").Character
Once the keypress is processed, clear the input buffer:
if ("p" -eq $k) {
'CPUpt' > $conf
$HOST.UI.RawUI.Flushinputbuffer()
"+" and "1" keys process input of CPUs/Sockets to display data for, while "-" key stops displaying that data.
Pressing "c" key will clear the screen in case it becomes garbled.
Pressing "q" key moves you to region Cleanup ending the script run.
TIPS & TRICKS
As opposed to Windows TaskManager, I show background processes too (ie. "services").In an effort to achieve smoother display of data, I am truncating CPU/Socket info to their integer values. Also, I do not use Thread counters but rather Process ones. Due to delay while displaying the data, there will always be some discrepancy between data displayed. I.e. Total CPU utilization in Header will rarely match sum of CPU utilization by processes in table. I can live with that.
Script is started in non-normalized CPU utilization mode which means CPU utilization per process can go well over 100% on modern boxes. Let's say you have Quad core box (8 CPUs) and a process taking 50% of Core0, 60% of Core1, 30% of Core2 and 20% of Core3 then the non-normalized CPU utilization for such process would be 160% while normalized CPU utilization would be 20% (160/8). I did it as such to confirm that process actually uses more than one CPU. To toggle between non-normalized and normalized view, use "p" key.
If, for any reason, display becomes garbled, press the "c" key.
Number of processes to display is controlled by $procToDisp variable which is, atm, hard-coded to 25.
Initial sort order is defined by $procSortBy variable. Default is CPU% ($procSortBy = 'CPUpt').
IF by any chance script does not terminate normally:
- First type Get-Job
- Check that Name has "Top_Header_Job" & "Top_Processes_Job". Remember the Id (or use Name parameter).
Say Id's are 14 and 16.
- Type commands (text after # is just a comment):
[console]::CursorVisible = $true #reclaims the cursor
[console]::TreatControlCAsInput = $false #reverts CTRL+C processing to default value
receive-job -id 14
receive-job -id 16
stop-job -id 16
stop-job -id 14
remove-job -id 14
remove-job -id 16
or just exit the Powershell window.
Hope you'll find this script useful in your work!
This is all from me for this series. Next, I will start new series of blogs describing script used as testing/benchmarking framework on Windows which is also available in the package.