#1560 closed defect (worksforme)
CPU Usage and Load average peaked at 3am this morning
| Reported by: | HwyXingFrog | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | Backend | Version: | 8.0.2-RELEASE |
| Keywords: | Cc: |
Description
All I really have are the Reporting Charts.
I couldn't even load the web page to reboot, I had to use the power button on the box. And even after the reboot, the CPU Usage and System Load are showing much higher than average, and file access is now very slow.
Is there any more information I can capture from logs or anything.
It seems my FreeNAS system does things like this after it runs for a month or so, don't really have any concrete conclusions.
Attachments (4)
Change History (8)
Changed 12 months ago by HwyXingFrog
Changed 12 months ago by HwyXingFrog
Changed 12 months ago by HwyXingFrog
Changed 12 months ago by HwyXingFrog
comment:1 Changed 12 months ago by jpaetzel
comment:2 Changed 12 months ago by HwyXingFrog
The system finally recovered after another reboot.
So, if this happens again, this is all the extra info that helps:
[root@freenas] ~# zpool status
pool: Pool1
state: ONLINE
scrub: scrub completed after 2h30m with 0 errors on Sat Jun 2 03:25:13 2012
config:
NAME STATE READ WRITE CKSUM
Pool1 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gptid/2988f3da-a278-11e0-a287-001fd05b97a8 ONLINE 0 0 0
gptid/29ea801d-a278-11e0-a287-001fd05b97a8 ONLINE 0 0 0
gptid/2a4f3b89-a278-11e0-a287-001fd05b97a8 ONLINE 0 0 0
gptid/2acb6c0f-a278-11e0-a287-001fd05b97a8 ONLINE 0 0 0
gptid/2b41e1c3-a278-11e0-a287-001fd05b97a8 ONLINE 0 0 0
gptid/2bc761c6-a278-11e0-a287-001fd05b97a8 ONLINE 0 0 0
errors: No known data errors
[root@freenas] ~# top
last pid: 10956; load averages: 0.05, 0.01, 0.00 up 0+11:06:02 12:00:31
45 processes: 1 running, 44 sleeping
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 370M Active, 1232M Inact, 2034M Wired, 129M Cache, 198M Buf, 56M Free
Swap: 12G Total, 1184K Used, 12G Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
2487 root 1 44 0 47788K 7324K select 0 0:17 0.00% smbd
10355 root 1 44 0 47796K 7592K select 0 0:14 0.00% smbd
1847 root 7 44 0 65044K 7388K ucond 1 0:13 0.00% collectd
2215 www 1 44 0 19324K 3436K kqread 0 0:05 0.00% lighttpd
1750 root 6 44 0 126M 68324K uwait 0 0:04 0.00% python
1501 root 1 44 0 11776K 2236K select 0 0:01 0.00% ntpd
1347 root 1 44 0 46624K 6444K select 1 0:01 0.00% smbd
1343 root 1 44 0 38048K 4356K select 0 0:00 0.00% nmbd
2305 root 1 76 0 64096K 23068K ttyin 1 0:00 0.00% python
1962 root 1 54 0 7832K 1184K nanslp 0 0:00 0.00% cron
1088 root 1 44 0 6904K 1176K select 0 0:00 0.00% syslogd
1731 avahi 1 44 0 16932K 2236K select 1 0:00 0.00% avahi-daemon
10914 root 1 44 0 47796K 7036K select 0 0:00 0.00% smbd
10915 root 1 44 0 33300K 4056K select 0 0:00 0.00% sshd
2129 root 1 44 0 7836K 1320K select 0 0:00 0.00% rpcbind
10917 root 1 44 0 10172K 2600K pause 1 0:00 0.00% csh
10356 root 1 44 0 46632K 6796K select 0 0:00 0.00% smbd
1403 root 1 44 0 46624K 6400K select 0 0:00 0.00% smbd
2133 root 1 49 0 6772K 1180K select 1 0:00 0.00% mountd
2306 root 1 76 0 6772K 924K ttyin 0 0:00 0.00% getty
1940 root 1 44 0 24972K 3048K select 0 0:00 0.00% sshd
2307 root 1 76 0 6772K 924K ttyin 1 0:00 0.00% getty
2312 root 1 76 0 6772K 924K ttyin 0 0:00 0.00% getty
2310 root 1 76 0 6772K 924K ttyin 1 0:00 0.00% getty
2311 root 1 76 0 6772K 924K ttyin 1 0:00 0.00% getty
2309 root 1 76 0 6772K 924K ttyin 1 0:00 0.00% getty
2308 root 1 76 0 6772K 924K ttyin 1 0:00 0.00% getty
1725 messagebus 1 70 0 7980K 1576K select 1 0:00 0.00% dbus-daemon
641 root 1 76 0 5684K 1084K select 0 0:00 0.00% dhclient
10956 root 1 44 0 9224K 2052K CPU1 0 0:00 0.00% top
10930 root 1 44 0 46632K 6684K select 0 0:00 0.00% smbd
1691 root 1 76 0 5812K 1068K select 0 0:00 0.00% rsync
788 root 1 44 0 3200K 576K select 1 0:00 0.00% devd
662 _dhcp 1 44 0 5684K 1156K select 1 0:00 0.00% dhclient
Let me know so I know the info to gather if/when this happens again.
Thanks.
comment:3 Changed 12 months ago by william
- Resolution set to worksforme
- Status changed from new to closed
Taking a look at zpool status lloks like it was zpool scrub.
Scrub runs every 30 days in 8.0.x.
comment:4 Changed 12 months ago by HwyXingFrog
So, then the issue would be that the scrub pinned the cpu for 9+ hours when it initiated the scrub, but then after a reboot it only took 2.5 hours (According to the zpool status message above).

There are tasks the underlying OS runs at 3am. It's also possible that the filesystem started doing a scrub. Can you paste the output of running zpool status from the CLI?
Perhaps a snapshot of the output of top as well.