wviewweather.com Forum Index wviewweather.com
wview and Weather Topics
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Assorted Problems

 
Post new topic   Reply to topic    wviewweather.com Forum Index -> Troubleshooting
View previous topic :: View next topic  
Author Message
Corvar



Joined: 16 Sep 2005
Posts: 8

PostPosted: Fri Sep 16, 2005 9:49 pm    Post subject: Assorted Problems Reply with quote

I am using Debian Sid, and I am having a couple of issues. The first problem is with compiling radlib 2.2.4 and 2.2.5 with --enable-mysql. Currently, the mysql dev package I have installed is libmysqlclient15-dev (5.0.11beta). I get the following error:

if gcc -DHAVE_CONFIG_H -I. -I. -I. -I./h -I/usr/local/include -D_GNU_SOURCE -I/usr/local/include/mysql -I/usr/include/mysql -g -O2 -MT my_database.o -MD -MP -MF ".deps/my_database.Tpo" -c -o my_database.o `test -f './database/mysql/my_database.c' || echo './'`./database/mysql/my_database.c; \then mv -f ".deps/my_database.Tpo" ".deps/my_database.Po"; else rm -f ".deps/my_database.Tpo"; exit 1; fi
In file included from ./database/mysql/my_database.c:49:
./h/radlist.h:71: error: conflicting types for 'LIST'
/usr/include/mysql/my_list.h:27: error: previous declaration of 'LIST' was here
make[2]: *** [my_database.o] Error 1
make[2]: Leaving directory `/tmp/radlib-2.2.5'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/radlib-2.2.5'
make: *** [all] Error 2


If I compile without MySQL enabled, everything compiles fine and starts up, but after a couple of days of running, wviewd seems to crash. I get the following in my system log:

Sep 12 13:25:04 wrt wviewd[4866]: <1268072995> : storing record for 09/12/2005 13:25
Sep 12 13:25:10 wrt htmlgend[4870]: <1268078698> : radQueueSend: write failed on fd 8: Broken pipe
Sep 12 13:25:10 wrt htmlgend[4870]: <1268078698> : exiting normally...
Back to top
View user's profile Send private message
Corvar



Joined: 16 Sep 2005
Posts: 8

PostPosted: Fri Sep 16, 2005 9:52 pm    Post subject: Reply with quote

Oh, BTW the crash was happening on 1.7.7, I just upgraded to 1.7.8 and don't know if that will fix the problem.
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Sat Sep 17, 2005 7:53 am    Post subject: Reply with quote

Don't use MySQL 5 - use one of the accepted 4.x releases. radlib has had a construct called "LIST" for about 10 years... A lot of code would have to be changed to change that name. I may do that in the future, but not any time soon. MySQL 4.x works quite well.

That is hardly a crash - but it does look like the pipe file is getting corrupted. Old hard drive? Bad sector?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Corvar



Joined: 16 Sep 2005
Posts: 8

PostPosted: Sat Sep 17, 2005 6:06 pm    Post subject: Reply with quote

Possibly a note in the docs that you have to use MySQL4 would be in order. I half assumed that, but decided when it didn't compile cleanly that I didn't need SQL support anyways.

I interpretted that error message to mean that wviewd silently died, and when htmlgend tried to talk to wviewd the pipe was no longer there. But the machine is in general quite stable, generally new, the drives are passing S.M.A.R.T., and nothing else is failing.
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Sat Sep 17, 2005 6:37 pm    Post subject: Reply with quote

Possibly, although I don't make it a point to keep track of the latest SQL software release versions. They are both horrible about making big changes in their new versions - very disconcerting...

You can figure out if wviewd is running real fast by executing "ps aux | grep wview" - there should be two processes listed for wviewd. It looks more like a corrupted pipe device file or corrupted IPC pipe data. No idea why. But htmlgend is not crashing, just exiting gracefully.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Corvar



Joined: 16 Sep 2005
Posts: 8

PostPosted: Sat Sep 17, 2005 7:00 pm    Post subject: Reply with quote

I can say for sure that wviewd stops around the same time as htmlgend gives that message, i.e. wviewd is no longer running when I look and the last log entry from wviewd is right before htmlgend prints that.
Back to top
View user's profile Send private message
Corvar



Joined: 16 Sep 2005
Posts: 8

PostPosted: Mon Sep 19, 2005 4:07 pm    Post subject: Reply with quote

Any recommendations on what I should do to try and get to the bottom of this problem?

I really don't think this is a problem with htmlgend, but instead with wviewd. I get the same messages in the syslog from htmlgend if I `killall -9 wviewd`. So it seems to me wviewd is dieing for some unknown reason, and then htmlgend exits because wview is no longer there.

Should I attempt to attach gdb to one of the wviewd processes and see if I can get a backtrace? Any educated guess if it would be best to attach to the first or second process?
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Mon Sep 19, 2005 7:09 pm    Post subject: Reply with quote

No guess required - the first or lower pid numbered process is the wviewd process and the other is the reflector process for IPC.

I have never seen wviewd just silently exit - there is no code path for that to occur. The signal handler logs and the normal exit logs too... very strange.

Only the KILL and STOP signals are not catchable and would cause termination silently. Doing a "kill -15 [wviewd pid]" uses the same signal processing path as a SEGV or SIGFPE, etc. You will see log messages as the wviewd signal handler catches these signals. Is there some cron job or periodic process in your system which might be sending wviewd a KILL?

Mark
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Corvar



Joined: 16 Sep 2005
Posts: 8

PostPosted: Mon Sep 19, 2005 7:31 pm    Post subject: Reply with quote

No there was not a cronjob or anything else sending it a KILL, and there seems to be no pattern to when it stops.

I will attempt to get yo a backtrace.
Back to top
View user's profile Send private message
Corvar



Joined: 16 Sep 2005
Posts: 8

PostPosted: Thu Sep 22, 2005 8:45 am    Post subject: Update Reply with quote

Figured I would give an update.

I ended up doing a system update, upgrading my kernel, and rebooting. Since then, I have wviewd has been running like a trouper with gdb attached the whole time. So that means to me either that whatever was breaking wviewd got fixed in the update, or that gdb is slowing down wviewd enough that whatever race condition that causes the crash doesn't apply.

I am going to detach gdb and see if the problem resurfaces, and will let you know the results in a few days (or quicker if I know something).
Back to top
View user's profile Send private message
Corvar



Joined: 16 Sep 2005
Posts: 8

PostPosted: Mon Sep 26, 2005 10:35 am    Post subject: Reply with quote

The watched pot finally crashed, and I got a core out of it. The backtrace from gdb is:

#0 0xb7e959e7 in raise () from /lib/tls/libc.so.6
#1 0xb7e97479 in abort () from /lib/tls/libc.so.6
#2 0x0804cbb3 in defaultSigHandler (signum=Variable "signum" is not available.
) at ../daemon/daemon.c:214
#3 <signal handler called>
#4 0x0805560b in vpifGetRXCheck (work=0x805f8e0)
at ../daemon/vpinterface.c:1093
#5 0x0804ed4f in daemonReceiveArchiveState (state=6, stimulus=0xbfed20d0,
data=0x805f8e0) at ../daemon/daemonStates.c:1014
#6 0x08059e9c in radStatesProcess (id=0x8063310, stimulus=0xbfed20d0)
at ./src/radstates.c:158
#7 0x0804ca3a in stationDataCallback (fd=0, userData=0x0)
at ../daemon/daemon.c:336
#8 0x08057897 in radProcessWait (timeout=0) at ./src/radprocess.c:485
#9 0x0804d805 in main (argc=1, argv=0xbfed2494) at ../daemon/daemon.c:1058


If you want any more information out of me, just ask.
Back to top
View user's profile Send private message
mteel



Joined: 30 Jun 2005
Posts: 435
Location: Collinsville, TX

PostPosted: Mon Sep 26, 2005 3:32 pm    Post subject: Reply with quote

Maybe I found a little divide-by-zero bug in the newly added code to vpifGetRXCheck to support the rxcheck.png chart...

Let's see how the one I sent you behaves...

Mark
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    wviewweather.com Forum Index -> Troubleshooting All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group