wviewweather.com Forum Index wviewweather.com
wview and Weather Topics
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

wvwunderd 'hangs'
Goto page 1, 2  Next
 
Post new topic   Reply to topic    wviewweather.com Forum Index -> Weather Underground
View previous topic :: View next topic  
Author Message
THogland



Joined: 26 Sep 2005
Posts: 33

PostPosted: Sun Oct 09, 2005 7:15 pm    Post subject: wvwunderd 'hangs' Reply with quote

No idea why this is happening, but after a day or so wvwunderd stops sending data. There's no errors in the system logs (even on the verbose settings), and using the init'd script (or manually using kill-15) won't kill it - just kill -9... Killing it and restarting it doesn't upload any missing data, so the system thinks it's been working.

This is 1.81, fresh install. I've tried using the init.d/wview stop/start, and also manually stopping and starting the wvwunderd manually, but neither makes a difference.

Is there a good way to debug this, or is this a known issue?
Back to top
View user's profile Send private message
THogland



Joined: 26 Sep 2005
Posts: 33

PostPosted: Sun Oct 09, 2005 8:02 pm    Post subject: Log info... Reply with quote

Code:
Okay - I get security updates e-mailed to me from the server that's running wview, and checking them tonight I found a string of errors starting yesterday....

radBufferGet failed: notify

They're listed as security errors, not system events; the standard "I'm doing x" messages are showing up in the events log - stuff like this:

Oct  9 03:05:05 fred wviewd[28641]: <3574473730> : storing record for 10/09/2005 03:05
Oct  9 03:05:05 fred htmlgend[28645]: <3574474395> : Adding 5 minute sample...
Oct  9 03:05:10 fred wvcwopd[28653]: <3574478522> : CWOP-sending: CW1074>APRS,TCPXX*,qAX,CW1074:@091105z6137.09N/14922.07W_031/007g010t038P000h73b09939.wview181

... and so on, and so on...
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Mon Oct 10, 2005 5:34 am    Post subject: Reply with quote

It is not a known issue, there are only a couple of vaguely described issues that I cannot reproduce. I don't let known issues that I can reproduce "remain" in the system - I fix them. One other guy reported that Wunderground submissions would just "stop" but gave no other details of value (you have).

You apparently have your system log messages of type "critical" being routed somewhere other than the normal log file, that is why they are not showing up in the /var/log/messages file. Check your /etc/syslog.conf file and look for where "*.crit" messages are being routed - add that to your "/var/log/messages" line so they are logged to that file as well.

That log message indicates that there are no system buffers available of the size required for the archive notify message, which should be the smallest or certainly the next smallest size. Very odd considering your CWOP submissions seem to be working and that process receives the same archive notify message just before the wunderground process should receive it...

Did you change the "sysBufferCounts" values in radsysdefs.c prior to building radlib? This is where the numbers of system buffers are defined.

If not, then there would appear to be a buffer leak although my wview has been running for well over a week with no CWOP or Wunderground problems. Other folks' have been running even longer.

I will try to whip up a simple debug utility to attach to the wview system and dump out system buffer counts tonight and send it to you so we can get a better idea what is going on...
Back to top
View user's profile Send private message Send e-mail Visit poster's website
THogland



Joined: 26 Sep 2005
Posts: 33

PostPosted: Mon Oct 10, 2005 8:59 am    Post subject: Reply with quote

mteel wrote:
It is not a known issue, there are only a couple of vaguely described issues that I cannot reproduce. I don't let known issues that I can reproduce "remain" in the system - I fix them. One other guy reported that Wunderground submissions would just "stop" but gave no other details of value (you have).

You apparently have your system log messages of type "critical" being routed somewhere other than the normal log file, that is why they are not showing up in the /var/log/messages file. Check your /etc/syslog.conf file and look for where "*.crit" messages are being routed - add that to your "/var/log/messages" line so they are logged to that file as well.

That log message indicates that there are no system buffers available of the size required for the archive notify message, which should be the smallest or certainly the next smallest size. Very odd considering your CWOP submissions seem to be working and that process receives the same archive notify message just before the wunderground process should receive it...

Did you change the "sysBufferCounts" values in radsysdefs.c prior to building radlib? This is where the numbers of system buffers are defined.

If not, then there would appear to be a buffer leak although my wview has been running for well over a week with no CWOP or Wunderground problems. Other folks' have been running even longer.

I will try to whip up a simple debug utility to attach to the wview system and dump out system buffer counts tonight and send it to you so we can get a better idea what is going on...


I dug through syslog.conf and there's no *.crit in there... Added that line and restarted - we'll see what it does...

I didn't change anything in my 1.81 source files except the name of the mesonet output file, and I'm not getting any FTP daemon errors. My original 1.80 (pre-mesonet) ran quite stably for more than a day at a time, so I've no idea what might have happened.

This morning nothing was running - I *thought* it was running last night, but the logs show different Sad So, started it a few minutes ago and will see what happens.
Back to top
View user's profile Send private message
THogland



Joined: 26 Sep 2005
Posts: 33

PostPosted: Mon Oct 10, 2005 2:10 pm    Post subject: Update... Reply with quote

Rebooted and restarted, looked good again. (That was ~4 hours ago, just before leaving for work.) Just checked, no updates going anyplace - local folder for website, Weather Underground, etc. - nothing. So, I deleted the old source, downloaded a fresh copy, recompiled and reinstalled. Checked the .conf files. verified they're correct. Restarted, all looks good again.

Only issue is, I get the following when FTP'ing (which is the mesonet.txt file):
Code:
'EPRT |1|192.168.127.10|32769|': command not understood.

The file sends, I just get that error beforehand... Not a new error (it's done that since 1.81), not a show-stopper, just an oddity. It also didn't do it when FTP'ing to my server, just since I started FTP'ing to NOAA mesonet... I'm guessing the HP/UX FTP server at NOAA is confused Smile

I'll post again if this hangs/dies/whatever again in the next day or so.
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Mon Oct 10, 2005 6:57 pm    Post subject: Reply with quote

OK, if it happens again, before you restart wview (leave it running), download this version of radlib: http://prdownloads.sourceforge.net/radlib/radlib-2.3.0.tar.gz?download

Install it as normal (wview still running):

tar zxvf radlib-2.3.0.tar.gz
cd radlib-2.3.0
./configure
make install
cd debug
make
./raddebug 1

Post the output of that utility...


Last edited by mteel on Wed Oct 12, 2005 5:14 am; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
THogland



Joined: 26 Sep 2005
Posts: 33

PostPosted: Mon Oct 10, 2005 10:23 pm    Post subject: raddebug output... Reply with quote

Froze up again - 12:05 to 13:10 Sad

Code:

tom@fred:/usr/src/radlib-2.3.0/debug$ sudo ./raddebug 1
Password:

Attached to wview radlib system 1: UP 0 years, 0 months, 0 days, 8 hours, 20 minutes, 44 seconds

Buffer Allocation by Size:
Dumping index 0: size 64: Free/Total 1/64
Dumping index 1: size 128: Free/Total 100/128
Dumping index 2: size 256: Free/Total 247/256
Dumping index 3: size 512: Free/Total 253/256
Dumping index 4: size 1024: Free/Total 128/128
Dumping index 5: size 2048: Free/Total 64/64
Dumping index 6: size 4096: Free/Total 32/32

Buffer Summary:
        Total Free: 825
        Total Allocated: 103
        Total Allocations Since Started: 3199

Semaphore Info:
INDEX   COUNT  WAITERS  ZCNT   PID
  0       0      0        0     0
  1       0      0        0     0
  2       1      0        0     30484
  3       0      0        0     0
  4       1      0        0     12200
  5       0      0        0     0
  6       0      0        0     0
  7       0      0        0     0
  8       0      0        0     0
  9       0      0        0     0
 10       0      0        0     0
 11       0      0        0     0
 12       0      0        0     0
 13       0      0        0     0
 14       0      0        0     0
 15       0      0        0     0

tom@fred:/usr/src/radlib-2.3.0/debug$
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Mon Oct 10, 2005 10:35 pm    Post subject: Reply with quote

OK, you are not running out of buffers, although there are more allocated than there should be (should be more like 15 or so nominally)...

Makes me think that libcurl is having a problem...

Post results of:
ps aux | grep wv

I bet wvwunderd is hung up waiting on libcurl to finish a transaction - you could do a little research on libcurl - see how to debug it or look for error output (I think it has its own log file), etc. I would, but I don't have much time during the week.

Your log file, anything unusual in it? Are there bufferGet failure messages or queueSend failure messages? If anything in it doesn't look like the example normal output in the User Manual, post it with some context around it.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
THogland



Joined: 26 Sep 2005
Posts: 33

PostPosted: Mon Oct 10, 2005 11:24 pm    Post subject: ps aux output... Reply with quote

I'm guessing you want this when it's hung... I've already restarted it, so it looks fine now. I ran it and left it sitting in a window; if/when it hangs I'll run it again and post.

Any particular version of libcurl you're using? Mine shows libcurl3 7.13.2-2... Also, any sense in upping the 64 buffer to 128? Or is one free normal?
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Tue Oct 11, 2005 5:07 am    Post subject: Reply with quote

No 1 free does not appear to be normal (as I said earlier), but it would only mean that radlib would go up to the next bigger size when allocating. My guess is the ones which are allocated (other than the 15 or so that stay allocated) were for the Archive Notify message to wvwunderd, and they are sitting in his FIFO waiting to be read while he waits on a libcurl transaction. Just a guess at this point...

I am using 7.13.1 but I doubt the newer version is the problem.

Also, you could look in your log file for the unusual messages even after restarting, and do the libcurl error log investigation as well...
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mhweather



Joined: 07 Aug 2005
Posts: 54

PostPosted: Tue Oct 11, 2005 7:13 am    Post subject: Reply with quote

I saw this in Gary's Weathernet Forum (http://www.weatherforum.net), some people are complaining about WU not updating:

Quote:

Hello,
Two major network providers (Cogent and Level3) are having an argument
about who should pay for the traffic between their two networks. This has
resulted in a major Internet outage that affects wunderground.com, who
uses Cogent. See:

http://www.broadbandreports.com/shownews/68174

Thanks,
Christine Stowe

Customer Service
Weather Underground, Inc


I haven't had any issues myself on WU and I'm running 1.8.1. Just curious, have you looked at a TCP/IP packet trace using a program like Ethereal ? Is data really being sent ?
Back to top
View user's profile Send private message
THogland



Joined: 26 Sep 2005
Posts: 33

PostPosted: Tue Oct 18, 2005 11:28 am    Post subject: Frustrated update... Reply with quote

So, I checked all the configs (again), went through dselect and loaded all the libcurl, libgd, etc. libraries (the verious debug, dev, etc. versions, including the -gssapi and -ocaml ones), did a few other system updates, shutdown (to swap UPS connections), and since then there's been *NO* problems. No problems for over 4 days. (I thought that it might be "working until you shut it down and start it manually, when it starts having issues" but I 'wview stop', changed the .conf to turn off verbose logging, then started it again and so far it's looking fine.)

I get an occasional wakeup error, or invalid data packet, this kinda stuff:

Oct 18 00:05:07 fred wviewd[2648]: <46308281> : wakeupConsole: Read ERROR!
Oct 18 00:05:07 fred wviewd[2648]: <46308281> : daemonRunState: WAKEUP failed
Oct 18 00:20:05 fred wviewd[2648]: <47206702> : daemonReceiveArchiveState: WAKEUP2 failed
Oct 18 00:20:05 fred wviewd[2648]: <47206702> : wakeupConsole: Invalid data: 39 20
Oct 18 00:23:07 fred wviewd[2648]: <47388270> : wakeupConsole: Read ERROR!
Oct 18 00:23:07 fred wviewd[2648]: <47388270> : daemonRunState: WAKEUP failed

... but nothing that looks like real errors, and nothing that repeats regularly.

I even got a note from mesonet at NOAA that they love the regularly-updating data feed I'm providing them! I have *no* idea what happened, or didn't happen, or broke-which-what-way-when-where, but apparently it's happily working again, at least for the past hour or two. I'll call it fixed when it's up for another couple days (after the manual stop/start, which was the referenced hour or two ago), but I'm thinking I've got a case of the weird gremlins, not an actual repeatable software bug...

Thanks for the time on this, even though I'm not sure how productive it was Sad
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Tue Oct 18, 2005 12:02 pm    Post subject: Reply with quote

I suspect the network problems that Wunderground (as reported by MullicaHill above) was having was your problem...

BTW, there is no difference in a system boot wview start and a manual start...

Keep an eye on those console wakeup errors. I NEVER get them. I think it is an indication of a serial port/cable that is on the hairy edge. If you keep seeing them, you may want to try another serial cable. If that doesn't fix it, you may have a shaky serial port... Those are real errors, just not catastrophic errors.

I've added the raddebug utility to the radlib distro, so that was productive anyway Smile

Mark
Back to top
View user's profile Send private message Send e-mail Visit poster's website
THogland



Joined: 26 Sep 2005
Posts: 33

PostPosted: Mon Oct 24, 2005 8:07 am    Post subject: Repost... Reply with quote

Don't know how that last post showed up - I didn't send it...

Since I re-spliced the cable ends, I get those errors fairly regularly, but it's still reading and working properly. They're coming from a pretty new Dell server (less than a year old), so I don't think it's a bad port. I'm thinking I'll try setting the serial port at the correct speed manually and see if that's part of the problem. Otherwise I'm guessing the cable, so I'll have to dig up about 40' of new wire and re-string it...
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Mon Oct 24, 2005 8:12 am    Post subject: Reply with quote

It won't be the serial port setup done by wview, that has kinda been tested by a whole bunch of folks (not to mention my initial testing) and is pretty solid Wink

40 foot of serial cable sounds pretty likely, especially if you are wiring it up yourself.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    wviewweather.com Forum Index -> Weather Underground All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group