View previous topic :: View next topic |
Author |
Message |
Corvar
Joined: 16 Sep 2005 Posts: 8
|
Posted: Fri Sep 16, 2005 9:49 pm Post subject: Assorted Problems |
|
|
I am using Debian Sid, and I am having a couple of issues. The first problem is with compiling radlib 2.2.4 and 2.2.5 with --enable-mysql. Currently, the mysql dev package I have installed is libmysqlclient15-dev (5.0.11beta). I get the following error:
if gcc -DHAVE_CONFIG_H -I. -I. -I. -I./h -I/usr/local/include -D_GNU_SOURCE -I/usr/local/include/mysql -I/usr/include/mysql -g -O2 -MT my_database.o -MD -MP -MF ".deps/my_database.Tpo" -c -o my_database.o `test -f './database/mysql/my_database.c' || echo './'`./database/mysql/my_database.c; \then mv -f ".deps/my_database.Tpo" ".deps/my_database.Po"; else rm -f ".deps/my_database.Tpo"; exit 1; fi
In file included from ./database/mysql/my_database.c:49:
./h/radlist.h:71: error: conflicting types for 'LIST'
/usr/include/mysql/my_list.h:27: error: previous declaration of 'LIST' was here
make[2]: *** [my_database.o] Error 1
make[2]: Leaving directory `/tmp/radlib-2.2.5'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/radlib-2.2.5'
make: *** [all] Error 2
If I compile without MySQL enabled, everything compiles fine and starts up, but after a couple of days of running, wviewd seems to crash. I get the following in my system log:
Sep 12 13:25:04 wrt wviewd[4866]: <1268072995> : storing record for 09/12/2005 13:25
Sep 12 13:25:10 wrt htmlgend[4870]: <1268078698> : radQueueSend: write failed on fd 8: Broken pipe
Sep 12 13:25:10 wrt htmlgend[4870]: <1268078698> : exiting normally... |
|
Back to top |
|
|
Corvar
Joined: 16 Sep 2005 Posts: 8
|
Posted: Fri Sep 16, 2005 9:52 pm Post subject: |
|
|
Oh, BTW the crash was happening on 1.7.7, I just upgraded to 1.7.8 and don't know if that will fix the problem. |
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Sat Sep 17, 2005 7:53 am Post subject: |
|
|
Don't use MySQL 5 - use one of the accepted 4.x releases. radlib has had a construct called "LIST" for about 10 years... A lot of code would have to be changed to change that name. I may do that in the future, but not any time soon. MySQL 4.x works quite well.
That is hardly a crash - but it does look like the pipe file is getting corrupted. Old hard drive? Bad sector? |
|
Back to top |
|
|
Corvar
Joined: 16 Sep 2005 Posts: 8
|
Posted: Sat Sep 17, 2005 6:06 pm Post subject: |
|
|
Possibly a note in the docs that you have to use MySQL4 would be in order. I half assumed that, but decided when it didn't compile cleanly that I didn't need SQL support anyways.
I interpretted that error message to mean that wviewd silently died, and when htmlgend tried to talk to wviewd the pipe was no longer there. But the machine is in general quite stable, generally new, the drives are passing S.M.A.R.T., and nothing else is failing. |
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Sat Sep 17, 2005 6:37 pm Post subject: |
|
|
Possibly, although I don't make it a point to keep track of the latest SQL software release versions. They are both horrible about making big changes in their new versions - very disconcerting...
You can figure out if wviewd is running real fast by executing "ps aux | grep wview" - there should be two processes listed for wviewd. It looks more like a corrupted pipe device file or corrupted IPC pipe data. No idea why. But htmlgend is not crashing, just exiting gracefully. |
|
Back to top |
|
|
Corvar
Joined: 16 Sep 2005 Posts: 8
|
Posted: Sat Sep 17, 2005 7:00 pm Post subject: |
|
|
I can say for sure that wviewd stops around the same time as htmlgend gives that message, i.e. wviewd is no longer running when I look and the last log entry from wviewd is right before htmlgend prints that. |
|
Back to top |
|
|
Corvar
Joined: 16 Sep 2005 Posts: 8
|
Posted: Mon Sep 19, 2005 4:07 pm Post subject: |
|
|
Any recommendations on what I should do to try and get to the bottom of this problem?
I really don't think this is a problem with htmlgend, but instead with wviewd. I get the same messages in the syslog from htmlgend if I `killall -9 wviewd`. So it seems to me wviewd is dieing for some unknown reason, and then htmlgend exits because wview is no longer there.
Should I attempt to attach gdb to one of the wviewd processes and see if I can get a backtrace? Any educated guess if it would be best to attach to the first or second process? |
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Mon Sep 19, 2005 7:09 pm Post subject: |
|
|
No guess required - the first or lower pid numbered process is the wviewd process and the other is the reflector process for IPC.
I have never seen wviewd just silently exit - there is no code path for that to occur. The signal handler logs and the normal exit logs too... very strange.
Only the KILL and STOP signals are not catchable and would cause termination silently. Doing a "kill -15 [wviewd pid]" uses the same signal processing path as a SEGV or SIGFPE, etc. You will see log messages as the wviewd signal handler catches these signals. Is there some cron job or periodic process in your system which might be sending wviewd a KILL?
Mark |
|
Back to top |
|
|
Corvar
Joined: 16 Sep 2005 Posts: 8
|
Posted: Mon Sep 19, 2005 7:31 pm Post subject: |
|
|
No there was not a cronjob or anything else sending it a KILL, and there seems to be no pattern to when it stops.
I will attempt to get yo a backtrace. |
|
Back to top |
|
|
Corvar
Joined: 16 Sep 2005 Posts: 8
|
Posted: Thu Sep 22, 2005 8:45 am Post subject: Update |
|
|
Figured I would give an update.
I ended up doing a system update, upgrading my kernel, and rebooting. Since then, I have wviewd has been running like a trouper with gdb attached the whole time. So that means to me either that whatever was breaking wviewd got fixed in the update, or that gdb is slowing down wviewd enough that whatever race condition that causes the crash doesn't apply.
I am going to detach gdb and see if the problem resurfaces, and will let you know the results in a few days (or quicker if I know something). |
|
Back to top |
|
|
Corvar
Joined: 16 Sep 2005 Posts: 8
|
Posted: Mon Sep 26, 2005 10:35 am Post subject: |
|
|
The watched pot finally crashed, and I got a core out of it. The backtrace from gdb is:
#0 0xb7e959e7 in raise () from /lib/tls/libc.so.6
#1 0xb7e97479 in abort () from /lib/tls/libc.so.6
#2 0x0804cbb3 in defaultSigHandler (signum=Variable "signum" is not available.
) at ../daemon/daemon.c:214
#3 <signal handler called>
#4 0x0805560b in vpifGetRXCheck (work=0x805f8e0)
at ../daemon/vpinterface.c:1093
#5 0x0804ed4f in daemonReceiveArchiveState (state=6, stimulus=0xbfed20d0,
data=0x805f8e0) at ../daemon/daemonStates.c:1014
#6 0x08059e9c in radStatesProcess (id=0x8063310, stimulus=0xbfed20d0)
at ./src/radstates.c:158
#7 0x0804ca3a in stationDataCallback (fd=0, userData=0x0)
at ../daemon/daemon.c:336
#8 0x08057897 in radProcessWait (timeout=0) at ./src/radprocess.c:485
#9 0x0804d805 in main (argc=1, argv=0xbfed2494) at ../daemon/daemon.c:1058
If you want any more information out of me, just ask. |
|
Back to top |
|
|
mteel
Joined: 30 Jun 2005 Posts: 435 Location: Collinsville, TX
|
Posted: Mon Sep 26, 2005 3:32 pm Post subject: |
|
|
Maybe I found a little divide-by-zero bug in the newly added code to vpifGetRXCheck to support the rxcheck.png chart...
Let's see how the one I sent you behaves...
Mark |
|
Back to top |
|
|
|