|
wviewweather.com wview and Weather Topics
|
View previous topic :: View next topic |
Author |
Message |
THogland
Joined: 26 Sep 2005 Posts: 33
|
Posted: Sun Oct 09, 2005 7:15 pm Post subject: wvwunderd 'hangs' |
|
|
No idea why this is happening, but after a day or so wvwunderd stops sending data. There's no errors in the system logs (even on the verbose settings), and using the init'd script (or manually using kill-15) won't kill it - just kill -9... Killing it and restarting it doesn't upload any missing data, so the system thinks it's been working.
This is 1.81, fresh install. I've tried using the init.d/wview stop/start, and also manually stopping and starting the wvwunderd manually, but neither makes a difference.
Is there a good way to debug this, or is this a known issue? |
|
Back to top |
|
|
THogland
Joined: 26 Sep 2005 Posts: 33
|
Posted: Sun Oct 09, 2005 8:02 pm Post subject: Log info... |
|
|
Code: | Okay - I get security updates e-mailed to me from the server that's running wview, and checking them tonight I found a string of errors starting yesterday....
radBufferGet failed: notify
They're listed as security errors, not system events; the standard "I'm doing x" messages are showing up in the events log - stuff like this:
Oct 9 03:05:05 fred wviewd[28641]: <3574473730> : storing record for 10/09/2005 03:05
Oct 9 03:05:05 fred htmlgend[28645]: <3574474395> : Adding 5 minute sample...
Oct 9 03:05:10 fred wvcwopd[28653]: <3574478522> : CWOP-sending: CW1074>APRS,TCPXX*,qAX,CW1074:@091105z6137.09N/14922.07W_031/007g010t038P000h73b09939.wview181
... and so on, and so on...
|
|
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Mon Oct 10, 2005 5:34 am Post subject: |
|
|
It is not a known issue, there are only a couple of vaguely described issues that I cannot reproduce. I don't let known issues that I can reproduce "remain" in the system - I fix them. One other guy reported that Wunderground submissions would just "stop" but gave no other details of value (you have).
You apparently have your system log messages of type "critical" being routed somewhere other than the normal log file, that is why they are not showing up in the /var/log/messages file. Check your /etc/syslog.conf file and look for where "*.crit" messages are being routed - add that to your "/var/log/messages" line so they are logged to that file as well.
That log message indicates that there are no system buffers available of the size required for the archive notify message, which should be the smallest or certainly the next smallest size. Very odd considering your CWOP submissions seem to be working and that process receives the same archive notify message just before the wunderground process should receive it...
Did you change the "sysBufferCounts" values in radsysdefs.c prior to building radlib? This is where the numbers of system buffers are defined.
If not, then there would appear to be a buffer leak although my wview has been running for well over a week with no CWOP or Wunderground problems. Other folks' have been running even longer.
I will try to whip up a simple debug utility to attach to the wview system and dump out system buffer counts tonight and send it to you so we can get a better idea what is going on... |
|
Back to top |
|
|
THogland
Joined: 26 Sep 2005 Posts: 33
|
Posted: Mon Oct 10, 2005 8:59 am Post subject: |
|
|
mteel wrote: | It is not a known issue, there are only a couple of vaguely described issues that I cannot reproduce. I don't let known issues that I can reproduce "remain" in the system - I fix them. One other guy reported that Wunderground submissions would just "stop" but gave no other details of value (you have).
You apparently have your system log messages of type "critical" being routed somewhere other than the normal log file, that is why they are not showing up in the /var/log/messages file. Check your /etc/syslog.conf file and look for where "*.crit" messages are being routed - add that to your "/var/log/messages" line so they are logged to that file as well.
That log message indicates that there are no system buffers available of the size required for the archive notify message, which should be the smallest or certainly the next smallest size. Very odd considering your CWOP submissions seem to be working and that process receives the same archive notify message just before the wunderground process should receive it...
Did you change the "sysBufferCounts" values in radsysdefs.c prior to building radlib? This is where the numbers of system buffers are defined.
If not, then there would appear to be a buffer leak although my wview has been running for well over a week with no CWOP or Wunderground problems. Other folks' have been running even longer.
I will try to whip up a simple debug utility to attach to the wview system and dump out system buffer counts tonight and send it to you so we can get a better idea what is going on... |
I dug through syslog.conf and there's no *.crit in there... Added that line and restarted - we'll see what it does...
I didn't change anything in my 1.81 source files except the name of the mesonet output file, and I'm not getting any FTP daemon errors. My original 1.80 (pre-mesonet) ran quite stably for more than a day at a time, so I've no idea what might have happened.
This morning nothing was running - I *thought* it was running last night, but the logs show different So, started it a few minutes ago and will see what happens. |
|
Back to top |
|
|
THogland
Joined: 26 Sep 2005 Posts: 33
|
Posted: Mon Oct 10, 2005 2:10 pm Post subject: Update... |
|
|
Rebooted and restarted, looked good again. (That was ~4 hours ago, just before leaving for work.) Just checked, no updates going anyplace - local folder for website, Weather Underground, etc. - nothing. So, I deleted the old source, downloaded a fresh copy, recompiled and reinstalled. Checked the .conf files. verified they're correct. Restarted, all looks good again.
Only issue is, I get the following when FTP'ing (which is the mesonet.txt file):
Code: | 'EPRT |1|192.168.127.10|32769|': command not understood. |
The file sends, I just get that error beforehand... Not a new error (it's done that since 1.81), not a show-stopper, just an oddity. It also didn't do it when FTP'ing to my server, just since I started FTP'ing to NOAA mesonet... I'm guessing the HP/UX FTP server at NOAA is confused
I'll post again if this hangs/dies/whatever again in the next day or so. |
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Mon Oct 10, 2005 6:57 pm Post subject: |
|
|
OK, if it happens again, before you restart wview (leave it running), download this version of radlib: http://prdownloads.sourceforge.net/radlib/radlib-2.3.0.tar.gz?download
Install it as normal (wview still running):
tar zxvf radlib-2.3.0.tar.gz
cd radlib-2.3.0
./configure
make install
cd debug
make
./raddebug 1
Post the output of that utility...
Last edited by mteel on Wed Oct 12, 2005 5:14 am; edited 1 time in total |
|
Back to top |
|
|
THogland
Joined: 26 Sep 2005 Posts: 33
|
Posted: Mon Oct 10, 2005 10:23 pm Post subject: raddebug output... |
|
|
Froze up again - 12:05 to 13:10
Code: |
tom@fred:/usr/src/radlib-2.3.0/debug$ sudo ./raddebug 1
Password:
Attached to wview radlib system 1: UP 0 years, 0 months, 0 days, 8 hours, 20 minutes, 44 seconds
Buffer Allocation by Size:
Dumping index 0: size 64: Free/Total 1/64
Dumping index 1: size 128: Free/Total 100/128
Dumping index 2: size 256: Free/Total 247/256
Dumping index 3: size 512: Free/Total 253/256
Dumping index 4: size 1024: Free/Total 128/128
Dumping index 5: size 2048: Free/Total 64/64
Dumping index 6: size 4096: Free/Total 32/32
Buffer Summary:
Total Free: 825
Total Allocated: 103
Total Allocations Since Started: 3199
Semaphore Info:
INDEX COUNT WAITERS ZCNT PID
0 0 0 0 0
1 0 0 0 0
2 1 0 0 30484
3 0 0 0 0
4 1 0 0 12200
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 0 0 0 0
12 0 0 0 0
13 0 0 0 0
14 0 0 0 0
15 0 0 0 0
tom@fred:/usr/src/radlib-2.3.0/debug$
|
|
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Mon Oct 10, 2005 10:35 pm Post subject: |
|
|
OK, you are not running out of buffers, although there are more allocated than there should be (should be more like 15 or so nominally)...
Makes me think that libcurl is having a problem...
Post results of:
ps aux | grep wv
I bet wvwunderd is hung up waiting on libcurl to finish a transaction - you could do a little research on libcurl - see how to debug it or look for error output (I think it has its own log file), etc. I would, but I don't have much time during the week.
Your log file, anything unusual in it? Are there bufferGet failure messages or queueSend failure messages? If anything in it doesn't look like the example normal output in the User Manual, post it with some context around it. |
|
Back to top |
|
|
THogland
Joined: 26 Sep 2005 Posts: 33
|
Posted: Mon Oct 10, 2005 11:24 pm Post subject: ps aux output... |
|
|
I'm guessing you want this when it's hung... I've already restarted it, so it looks fine now. I ran it and left it sitting in a window; if/when it hangs I'll run it again and post.
Any particular version of libcurl you're using? Mine shows libcurl3 7.13.2-2... Also, any sense in upping the 64 buffer to 128? Or is one free normal? |
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Tue Oct 11, 2005 5:07 am Post subject: |
|
|
No 1 free does not appear to be normal (as I said earlier), but it would only mean that radlib would go up to the next bigger size when allocating. My guess is the ones which are allocated (other than the 15 or so that stay allocated) were for the Archive Notify message to wvwunderd, and they are sitting in his FIFO waiting to be read while he waits on a libcurl transaction. Just a guess at this point...
I am using 7.13.1 but I doubt the newer version is the problem.
Also, you could look in your log file for the unusual messages even after restarting, and do the libcurl error log investigation as well... |
|
Back to top |
|
|
mhweather
Joined: 07 Aug 2005 Posts: 54
|
Posted: Tue Oct 11, 2005 7:13 am Post subject: |
|
|
I saw this in Gary's Weathernet Forum (http://www.weatherforum.net), some people are complaining about WU not updating:
Quote: |
Hello,
Two major network providers (Cogent and Level3) are having an argument
about who should pay for the traffic between their two networks. This has
resulted in a major Internet outage that affects wunderground.com, who
uses Cogent. See:
http://www.broadbandreports.com/shownews/68174
Thanks,
Christine Stowe
Customer Service
Weather Underground, Inc
|
I haven't had any issues myself on WU and I'm running 1.8.1. Just curious, have you looked at a TCP/IP packet trace using a program like Ethereal ? Is data really being sent ? |
|
Back to top |
|
|
THogland
Joined: 26 Sep 2005 Posts: 33
|
Posted: Tue Oct 18, 2005 11:28 am Post subject: Frustrated update... |
|
|
So, I checked all the configs (again), went through dselect and loaded all the libcurl, libgd, etc. libraries (the verious debug, dev, etc. versions, including the -gssapi and -ocaml ones), did a few other system updates, shutdown (to swap UPS connections), and since then there's been *NO* problems. No problems for over 4 days. (I thought that it might be "working until you shut it down and start it manually, when it starts having issues" but I 'wview stop', changed the .conf to turn off verbose logging, then started it again and so far it's looking fine.)
I get an occasional wakeup error, or invalid data packet, this kinda stuff:
Oct 18 00:05:07 fred wviewd[2648]: <46308281> : wakeupConsole: Read ERROR!
Oct 18 00:05:07 fred wviewd[2648]: <46308281> : daemonRunState: WAKEUP failed
Oct 18 00:20:05 fred wviewd[2648]: <47206702> : daemonReceiveArchiveState: WAKEUP2 failed
Oct 18 00:20:05 fred wviewd[2648]: <47206702> : wakeupConsole: Invalid data: 39 20
Oct 18 00:23:07 fred wviewd[2648]: <47388270> : wakeupConsole: Read ERROR!
Oct 18 00:23:07 fred wviewd[2648]: <47388270> : daemonRunState: WAKEUP failed
... but nothing that looks like real errors, and nothing that repeats regularly.
I even got a note from mesonet at NOAA that they love the regularly-updating data feed I'm providing them! I have *no* idea what happened, or didn't happen, or broke-which-what-way-when-where, but apparently it's happily working again, at least for the past hour or two. I'll call it fixed when it's up for another couple days (after the manual stop/start, which was the referenced hour or two ago), but I'm thinking I've got a case of the weird gremlins, not an actual repeatable software bug...
Thanks for the time on this, even though I'm not sure how productive it was |
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Tue Oct 18, 2005 12:02 pm Post subject: |
|
|
I suspect the network problems that Wunderground (as reported by MullicaHill above) was having was your problem...
BTW, there is no difference in a system boot wview start and a manual start...
Keep an eye on those console wakeup errors. I NEVER get them. I think it is an indication of a serial port/cable that is on the hairy edge. If you keep seeing them, you may want to try another serial cable. If that doesn't fix it, you may have a shaky serial port... Those are real errors, just not catastrophic errors.
I've added the raddebug utility to the radlib distro, so that was productive anyway
Mark |
|
Back to top |
|
|
THogland
Joined: 26 Sep 2005 Posts: 33
|
Posted: Mon Oct 24, 2005 8:07 am Post subject: Repost... |
|
|
Don't know how that last post showed up - I didn't send it...
Since I re-spliced the cable ends, I get those errors fairly regularly, but it's still reading and working properly. They're coming from a pretty new Dell server (less than a year old), so I don't think it's a bad port. I'm thinking I'll try setting the serial port at the correct speed manually and see if that's part of the problem. Otherwise I'm guessing the cable, so I'll have to dig up about 40' of new wire and re-string it... |
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Mon Oct 24, 2005 8:12 am Post subject: |
|
|
It won't be the serial port setup done by wview, that has kinda been tested by a whole bunch of folks (not to mention my initial testing) and is pretty solid
40 foot of serial cable sounds pretty likely, especially if you are wiring it up yourself. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|