From sgunderson at bigfoot.com Sat Oct 9 15:40:33 2010 From: sgunderson at bigfoot.com (Steinar H. Gunderson) Date: Sat, 9 Oct 2010 15:40:33 +0200 Subject: [mpm-itk] Switching vhosts during the same connection in Debian In-Reply-To: <20100805232423.GC26311@uio.no> References: <20100723162027.GU17040@stud.ntnu.no> <20100723162611.GB2008@uio.no> <20100730235544.GA25932@uio.no> <20100805232423.GC26311@uio.no> Message-ID: <20101009134033.GA2904@uio.no> On Fri, Aug 06, 2010 at 01:24:23AM +0200, Steinar H. Gunderson wrote: > Here's an updated version, with some small changes: It fixes the formatting, > adds an explanatory comment, fixes the logging, and makes sure the logic > doesn't trigger in subrequests. I'm still not 100% sure it cannot trigger on > the first request in a connection (given some obscure special case); testing > welcome. Ping. For the people who have reported this problem, does the patch help? Have you noticed any long-term problems? /* Steinar */ -- Homepage: http://www.sesse.net/ From roman at rs-labs.com Sun Oct 10 15:32:18 2010 From: roman at rs-labs.com (Roman Medina-Heigl Hernandez) Date: Sun, 10 Oct 2010 15:32:18 +0200 Subject: [mpm-itk] Mod_evasive In-Reply-To: <4C9795A9.5030006@rs-labs.com> References: <4C9795A9.5030006@rs-labs.com> Message-ID: <4CB1C062.80501@rs-labs.com> Nobody answered me... Any clue? El 20/09/2010 19:11, Roman Medina-Heigl Hernandez escribi?: > Hello, > > Did anybody get mod_evasive to work with mpm-itk? I didn't and I suspect > may be due to incompatibility with itk. Any idea? > > TIA. > -- Saludos, -Roman PGP Fingerprint: 09BB EFCD 21ED 4E79 25FB 29E1 E47F 8A7D EAD5 6742 [Key ID: 0xEAD56742. Available at KeyServ] From sgunderson at bigfoot.com Sun Oct 10 15:39:24 2010 From: sgunderson at bigfoot.com (Steinar H. Gunderson) Date: Sun, 10 Oct 2010 15:39:24 +0200 Subject: [mpm-itk] Mod_evasive In-Reply-To: <4CB1C062.80501@rs-labs.com> References: <4C9795A9.5030006@rs-labs.com> <4CB1C062.80501@rs-labs.com> Message-ID: <20101010133924.GA6673@uio.no> On Sun, Oct 10, 2010 at 03:32:18PM +0200, Roman Medina-Heigl Hernandez wrote: > Nobody answered me... Any clue? A quick Google search seems to indicate there are others having problems with this configuration. I'd advise contacting the mod_evasive developers and asking them. /* Steinar */ -- Homepage: http://www.sesse.net/ From roman at rs-labs.com Mon Oct 11 10:51:17 2010 From: roman at rs-labs.com (Roman Medina-Heigl Hernandez) Date: Mon, 11 Oct 2010 10:51:17 +0200 Subject: [mpm-itk] Mod_evasive In-Reply-To: <20101010133924.GA6673@uio.no> References: <4C9795A9.5030006@rs-labs.com> <4CB1C062.80501@rs-labs.com> <20101010133924.GA6673@uio.no> Message-ID: <4CB2D005.4030907@rs-labs.com> Yes, I know, I googled before asking :) But I didn't find a real answer (apart from the fact that more ppl seems having similar problems as mine). As I'm not familiarized with either itk internals, neither evasive's, I'm not at best position to research the problem for a solution :( . So I decided to write here (I'm unsure that mod_evasive is still maintained, at least mod_evasive's version I tested was old -and it was coming with latest Ubuntu-). I'll try to reach mod_evasive developpers, as requested. Cheers. -Roman Steinar H. Gunderson escribi?: > On Sun, Oct 10, 2010 at 03:32:18PM +0200, Roman Medina-Heigl Hernandez wrote: >> Nobody answered me... Any clue? > > A quick Google search seems to indicate there are others having problems with > this configuration. I'd advise contacting the mod_evasive developers and > asking them. > > /* Steinar */ From jeka at stfalcon.com Mon Oct 18 11:27:47 2010 From: jeka at stfalcon.com (Jevgen Gyrynovych) Date: Mon, 18 Oct 2010 12:27:47 +0300 Subject: [mpm-itk] hang apache2 Message-ID: <201010181227.48276.jeka@stfalcon.com> Hello, I have problem with mpm-itk. When load on the server apache hangs. This may occur during the execution of backups or under siege. In error.log have: [Mon Oct 18 02:33:01 2010] [error] child died with signal 11 [Mon Oct 18 02:34:46 2010] [error] child died with signal 11 [Mon Oct 18 02:37:01 2010] [error] child died with signal 11 [Mon Oct 18 02:44:45 2010] [error] child died with signal 11 [Mon Oct 18 02:46:27 2010] [error] child died with signal 11 [Mon Oct 18 02:52:04 2010] [error] child died with signal 11 zend_mm_heap corrupted With mpm-prefork I have not had such problems. OS: Ubuntu 10.04, mpm-itk 2.2.14 From sgunderson at bigfoot.com Mon Oct 18 11:53:47 2010 From: sgunderson at bigfoot.com (Steinar H. Gunderson) Date: Mon, 18 Oct 2010 11:53:47 +0200 Subject: [mpm-itk] hang apache2 In-Reply-To: <201010181227.48276.jeka@stfalcon.com> References: <201010181227.48276.jeka@stfalcon.com> Message-ID: <20101018095347.GH29906@uio.no> On Mon, Oct 18, 2010 at 12:27:47PM +0300, Jevgen Gyrynovych wrote: > Hello, I have problem with mpm-itk. When load on the server apache hangs. This > may occur during the execution of backups or under siege. > In error.log have: > [Mon Oct 18 02:33:01 2010] [error] child died with signal 11 > [Mon Oct 18 02:34:46 2010] [error] child died with signal 11 > [Mon Oct 18 02:37:01 2010] [error] child died with signal 11 > [Mon Oct 18 02:44:45 2010] [error] child died with signal 11 > [Mon Oct 18 02:46:27 2010] [error] child died with signal 11 > [Mon Oct 18 02:52:04 2010] [error] child died with signal 11 > zend_mm_heap corrupted If you get a corrupted heap (or segfaults in general), it's most likely a bug in some extension module. Try to see if you can get a backtrace, or if you can reproduce the bug under Valgrind. > With mpm-prefork I have not had such problems. Prefork is a bit faster, so if the bug is load-dependent, you might get into it a bit faster with mpm-itk. /* Steinar */ -- Homepage: http://www.sesse.net/ From lawrencewilliams at lcwsoft.com Mon Oct 18 17:16:02 2010 From: lawrencewilliams at lcwsoft.com (lawrencewilliams at lcwsoft.com) Date: Mon, 18 Oct 2010 12:46:02 -0230 Subject: [mpm-itk] hang apache2 In-Reply-To: <20101018095347.GH29906@uio.no> References: <201010181227.48276.jeka@stfalcon.com> <20101018095347.GH29906@uio.no> Message-ID: <77a5521e2ed92c9558ac6b5ecc23af7d@lcwsoft.com> On Mon, 18 Oct 2010 11:53:47 +0200, "Steinar H. Gunderson" wrote: > On Mon, Oct 18, 2010 at 12:27:47PM +0300, Jevgen Gyrynovych wrote: >> Hello, I have problem with mpm-itk. When load on the server apache >> hangs. This >> may occur during the execution of backups or under siege. >> In error.log have: >> [Mon Oct 18 02:33:01 2010] [error] child died with signal 11 >> [Mon Oct 18 02:34:46 2010] [error] child died with signal 11 >> [Mon Oct 18 02:37:01 2010] [error] child died with signal 11 >> [Mon Oct 18 02:44:45 2010] [error] child died with signal 11 >> [Mon Oct 18 02:46:27 2010] [error] child died with signal 11 >> [Mon Oct 18 02:52:04 2010] [error] child died with signal 11 >> zend_mm_heap corrupted > > If you get a corrupted heap (or segfaults in general), it's most likely a > bug > in some extension module. Try to see if you can get a backtrace, or if you > can reproduce the bug under Valgrind. > >> With mpm-prefork I have not had such problems. > > Prefork is a bit faster, so if the bug is load-dependent, you might get > into > it a bit faster with mpm-itk. > > /* Steinar */ Doesn't it look like an issue with PHP or triggered by it? I ask because it says zend_mm_heap corrupted, and Zend's only main thing in Apache configurations is PHP usually. - Lawrence From volker at blafoo.org Tue Oct 26 11:44:42 2010 From: volker at blafoo.org (volker) Date: Tue, 26 Oct 2010 11:44:42 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? Message-ID: <4CC6A30A.9060003@blafoo.org> Hi everyone, i am running several heavy-loaded Webservers using lenny stable with the following packages: ii apache2-mpm-itk 2.2.6-02-1+lenny3+b1 ii apache2-utils 2.2.9-10+lenny8 ii apache2.2-common 2.2.9-10+lenny8 The known problem with children being stuck in graceful should be fixed in this version according to info here: http://www.mail-archive.com/debian-apache at lists.debian.org/msg12518.html After receiving several complaints about Websites not being available with a 503 error, i noticed that these sites were not available due to the MaxClientsVhost settings which resulted in known errors like these: ### [Tue Oct 26 11:25:19 2010] [warn] MaxClientsVhost reached for www.anonymizeddomain.org, refusing client. ### While investigating this complaints i noticed, that some scoreboards did not seem to get reset after doing a http-reload. Several slots in the scoreboard where still taken/blocked with a 'G'. The example server has an uptime of 134 days: # uptime 11:33:52 up 134 days, 9:18, 1 user, load average: 6.97, 4.43, 3.57 and here are examples from the scoreboards prior to a reload: ### Current Time: Tuesday, 26-Oct-2010 11:29:11 CEST Restart Time: Thursday, 21-Oct-2010 11:20:58 CEST Parent Server Generation: 195 Server uptime: 5 days 8 minutes 13 seconds Total accesses: 4894295 - Total Traffic: 83.0 GB CPU Usage: u7.31 s5.57 cu0 cs0 - .00298% CPU load 11.3 requests/sec - 201.2 kB/second - 17.8 kB/request 68 requests currently being processed, 46 idle workers G_GGKKKGKG_GGKCKGGWK__GG_K_KG_._GG__GGGKG_GG.C..CK.GKKGGG.K_GK.G .KGGGK.GG.GGCGCCG.KGKWKGGC.G_G.GK_K_KW.CGKK..C.KC.GK.CGK_K__..._ K..._....K...................................................... ................................................................ ### Now lets do a reload and check the scoreboard ### Current Time: Tuesday, 26-Oct-2010 11:30:19 CEST Restart Time: Thursday, 21-Oct-2010 11:20:58 CEST Parent Server Generation: 196 Server uptime: 5 days 9 minutes 20 seconds Total accesses: 4895772 - Total Traffic: 83.0 GB CPU Usage: u3.99 s3.27 cu0 cs0 - .00168% CPU load 11.3 requests/sec - 201.2 kB/second - 17.8 kB/request 102 requests currently being processed, 6 idle workers GCGGC.WGCGKGG_CCGGGCCGGGGCW.GGCKGGGCGGGKG.GGCGCKKGGG.WGGGKK.GGGG ..GGGWGGG_GGWGGCGG.G_K.GGGCG.GKG_.CWW.G_GCG...WCKK.C.WC...K.._.K .GS..G.G........................................................ ................................................................ ### Right after the reload several processes are in status 'G'. Thats ok, they are just finishing up to mak room for new ones. Lets reload the server-status: ### Current Time: Tuesday, 26-Oct-2010 11:30:35 CEST Restart Time: Thursday, 21-Oct-2010 11:20:58 CEST Parent Server Generation: 196 Server uptime: 5 days 9 minutes 37 seconds Total accesses: 4896086 - Total Traffic: 83.0 GB CPU Usage: u8.27 s7.19 cu0 cs0 - .00357% CPU load 11.3 requests/sec - 201.2 kB/second - 17.8 kB/request 78 requests currently being processed, 29 idle workers G_GGK_KGKGCGG__C_GGK_KGG_KC_G__KGGK_GGG_G_GG_K_C.KGG__GGG.._G_CG K.GGGKKGG_GGKGC_G..G_K.GGKKG.G.G_.KK_._KG_K...CCK_.K.KK......_.. ..C.._.W........................................................ ................................................................ ### Odd, some Slots are still in status 'G'. Lets check a little later. Maybe some process just took a little longer. ### Current Time: Tuesday, 26-Oct-2010 11:31:24 CEST Restart Time: Thursday, 21-Oct-2010 11:20:58 CEST Parent Server Generation: 196 Server uptime: 5 days 10 minutes 25 seconds Total accesses: 4896821 - Total Traffic: 83.0 GB CPU Usage: u5.41 s4.08 cu0 cs0 - .00219% CPU load 11.3 requests/sec - 201.2 kB/second - 17.8 kB/request 93 requests currently being processed, 30 idle workers G_GG__KG_G_GG_KKCGG_KCGG__K_G_KKGGKKGGGCGWGG_K_K__GGKKGGGKKKGKKG WKGGG_KGG_GGKGWWG__G_K_GGCWGCGWGW_K__CKWG__CKCKK_KK_CWKK........ ..C.._._........................................................ ................................................................ ### Damn, many slots are still in graceful. After a restart everythings fine again: ### Current Time: Tuesday, 26-Oct-2010 11:32:44 CEST Restart Time: Tuesday, 26-Oct-2010 11:32:42 CEST Parent Server Generation: 197 Server uptime: 1 second Total accesses: 28 - Total Traffic: 98 kB CPU Usage: u.3 s.18 cu0 cs0 - 48% CPU load 28 requests/sec - 98.0 kB/second - 3584 B/request 26 requests currently being processed, 0 idle workers WWKWWKKWWWKWWKWWWWKKKKWKWKS..................................... ................................................................ ................................................................ ### Has anyone experienced or is experiencing the same problem on his servers? Mabye its the mix of package-versions with apache2-itk being 2.2.6 and apache2-utils/common being 2.2.9? I have not tried replacing the apache2-itk- with the prefork-package yet, because that is not quite as easy with all vhosts having assinguserid in them. But it would be possible if its needed for finding the source of the problem. Any comments or suggestions welcome! regards Volker From sgunderson at bigfoot.com Tue Oct 26 12:36:33 2010 From: sgunderson at bigfoot.com (Steinar H. Gunderson) Date: Tue, 26 Oct 2010 12:36:33 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <4CC6A30A.9060003@blafoo.org> References: <4CC6A30A.9060003@blafoo.org> Message-ID: <20101026103633.GB10784@uio.no> On Tue, Oct 26, 2010 at 11:44:42AM +0200, volker wrote: > GCGGC.WGCGKGG_CCGGGCCGGGGCW.GGCKGGGCGGGKG.GGCGCKKGGG.WGGGKK.GGGG > ..GGGWGGG_GGWGGCGG.G_K.GGGCG.GKG_.CWW.G_GCG...WCKK.C.WC...K.._.K > .GS..G.G........................................................ > ................................................................ > ### > > Right after the reload several processes are in status 'G'. Thats ok, > they are just finishing up to mak room for new ones. I guess the big question is, what are the servers really doing during this? Do you have a list of all apache2 processes on the system? /* Steinar */ -- Homepage: http://www.sesse.net/ From volker at blafoo.org Tue Oct 26 13:16:54 2010 From: volker at blafoo.org (volker) Date: Tue, 26 Oct 2010 13:16:54 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <20101026103633.GB10784@uio.no> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> Message-ID: <4CC6B8A6.4020908@blafoo.org> hi, thanks for the quick reply! > I guess the big question is, what are the servers really doing during this? > Do you have a list of all apache2 processes on the system? I do, but not from the server i took the example-scoreboards from. The processes on that server are fine for now because of the restart of the httpd. I have several other servers where i can post some info from. i'll also attach a file with the same info because its more readable (no wordwrap). Lets call this one server02: The current scoreboard: ### Current Time: Tuesday, 26-Oct-2010 12:48:19 CEST Restart Time: Wednesday, 21-Jul-2010 09:04:16 CEST Parent Server Generation: 576 Server uptime: 97 days 3 hours 44 minutes 2 seconds Total accesses: 37229823 - Total Traffic: 1800.9 GB CPU Usage: u18.04 s9.62 cu0 cs0 - .00033% CPU load 4.44 requests/sec - 225.0 kB/second - 50.7 kB/request 62 requests currently being processed, 29 idle workers _G_.GGGGGGGG_GGG.G_GGG_GGG_G.KGGG_GGW_GGKGGG_G_GGGG_GG__GWGG..WG G.__G.KG._GK_G__G_G._.._G..._K..K__.G_K..G......K..._...__...... ................................................................ ### A line for a G-state process from the scoreboard: ### 1-187 23013 0/22/262217 G 0.00 5319290 0 0.0 0.18 10527.80 66.249.71.185 www.anonymizeddomain.org GET /index.php?cfcd208495d565ef66e7dff9f98764da=euh7u1jvl9eqk31 ### Right now the Domain "anonymizeddomain.org" is not reachable because of MaxClientsVhost-error: ### [Tue Oct 26 10:36:53 2010] [warn] MaxClientsVhost reached for www.california-waldmohr.de, refusing client. ### Lets see what the process is actually doing: ### $ ps aux | grep 23013 root 23013 0.0 2.3 131716 99068 ? S Aug25 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start ### Looks ok. What does strace say: ### $ strace -p 23013 Process 23013 attached - interrupt to quit waitpid(25730, ### That does not loook right. Lets try another process: Next line for a G-state process from the scoreboard: ### 7-474 14190 0/79/589868 G 1.23 1388806 0 0.0 0.75 26984.69 66.249.65.220 www.anonymizeddomain.org GET /index.php?cfcd208495d565ef66e7dff9f98764da=be3mscv53clvbsg ### What is the process actually doing: ### $ ps aux | grep 14190 root 14190 0.0 5.5 265336 228016 ? S Oct10 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start ### Looks ok. What does strace say: ### $ strace -p 14190 Process 14190 attached - interrupt to quit waitpid(542, ### I have checked a couple of other processes. They all behave the same and are all stuck in wait(). I noticed that all of them belong to root. Here is a list of the ones i checked as an 'ps aux' (there are more but i only checked the ones below): ### # ps aux | grep -E '(23013|14190|13950|29632|20091|21539|20775)' | grep -v grep root 13950 0.0 4.1 209324 170900 ? S Sep18 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start root 14190 0.0 5.5 265336 228016 ? S Oct10 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start root 20091 0.0 4.1 210580 172168 ? S Sep19 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start root 20775 0.0 2.3 131716 99068 ? S Aug25 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start root 21539 0.0 1.6 102052 69572 ? S Aug12 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start root 23013 0.0 2.3 131716 99068 ? S Aug25 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start root 29632 0.0 2.3 131716 99068 ? S Aug25 0:00 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start ### I also took the anonymized.org-domain and checked how many occurances it had in the scoreboard: ### $ cat scoreboard | grep anonymizeddomain.org | wc -l 48 ### The configured MaxClientsVhost for this domain is '60'. If every process in G-state blocks a slot, that would explain. why the domain is constantly not reachable. Is there more info that helps that i can give you? regards volker -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: info.txt URL: From sgunderson at bigfoot.com Tue Oct 26 13:55:19 2010 From: sgunderson at bigfoot.com (Steinar H. Gunderson) Date: Tue, 26 Oct 2010 13:55:19 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <4CC6B8A6.4020908@blafoo.org> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> Message-ID: <20101026115519.GE10784@uio.no> On Tue, Oct 26, 2010 at 01:16:54PM +0200, volker wrote: > Lets see what the process is actually doing: > ### > $ ps aux | grep 23013 > root 23013 0.0 2.3 131716 99068 ? S Aug25 0:00 > /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start > ### > > > Looks ok. What does strace say: > ### > $ strace -p 23013 > Process 23013 attached - interrupt to quit > waitpid(25730, > ### > > That does not loook right. OK. So the question is, what is this pid 25730? Is... or was. Also, can you get a backtrace? (gdb -p 23013 and run bt). Do you have any unusual Apache modules in your setup? /* Steinar */ -- Homepage: http://www.sesse.net/ From volker at blafoo.org Tue Oct 26 16:22:39 2010 From: volker at blafoo.org (volker) Date: Tue, 26 Oct 2010 16:22:39 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <20101026115519.GE10784@uio.no> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> Message-ID: <4CC6E42F.5010808@blafoo.org> Hi, here we go: > OK. So the question is, what is this pid 25730? Is... or was. Also, can you > get a backtrace? (gdb -p 23013 and run bt). See the attached file. I have not used gdb at all so it does not tell me very much. But the backtrace does not look like it should i guess. > Do you have any unusual Apache modules in your setup? See the attached list. Couple notes: The 'vhost_list_modules' was used prior to the itk. It is not being used anymore. The 'status_module' is just a self-compiled 'status_module' that has some strings changed. The 'evasive20_module' is just loaded, not used. All the rest ist lenny-standard. regards volker -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: gdb-output.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: modules.txt URL: From sgunderson at bigfoot.com Tue Oct 26 20:14:34 2010 From: sgunderson at bigfoot.com (Steinar H. Gunderson) Date: Tue, 26 Oct 2010 20:14:34 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <4CC6E42F.5010808@blafoo.org> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> <4CC6E42F.5010808@blafoo.org> Message-ID: <20101026181434.GK10784@uio.no> On Tue, Oct 26, 2010 at 04:22:39PM +0200, volker wrote: >> OK. So the question is, what is this pid 25730? Is... or was. Also, can you >> get a backtrace? (gdb -p 23013 and run bt). > See the attached file. I have not used gdb at all so it does not tell me > very much. But the backtrace does not look like it should i guess. The interesting part here is: > 0xb7e50f0e in ?? () from /lib/libc.so.6 > (gdb) bt > #0 0xb7e50f0e in ?? () from /lib/libc.so.6 > #1 0xb7e0090b in ?? () from /lib/libc.so.6 > Backtrace stopped: previous frame identical to this frame (corrupt stack?) which essentially says it's stuck in libc somewhere. Unfortunately, there's not much interesting information to go on. Try installing the libc6-dbg package and see if you get more symbols. In any case, the waitpid is correct; the child process hasn't died yet. The question is, of course, why the child process is still alive and what it's doing. > The 'vhost_list_modules' was used prior to the itk. It is not being used > anymore. If you don't need it, try to take it out. > The 'evasive20_module' is just loaded, not used. Others have reported issues with mod_evasive, although not of the kind you're reporting. I guess you could take it out as well. /* Steinar */ -- Homepage: http://www.sesse.net/ From volker at blafoo.org Wed Oct 27 10:30:37 2010 From: volker at blafoo.org (volker) Date: Wed, 27 Oct 2010 10:30:37 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <20101026181434.GK10784@uio.no> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> <4CC6E42F.5010808@blafoo.org> <20101026181434.GK10784@uio.no> Message-ID: <4CC7E32D.7060006@blafoo.org> hi, >> 0xb7e50f0e in ?? () from /lib/libc.so.6 >> (gdb) bt >> #0 0xb7e50f0e in ?? () from /lib/libc.so.6 >> #1 0xb7e0090b in ?? () from /lib/libc.so.6 >> Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > which essentially says it's stuck in libc somewhere. Unfortunately, there's > not much interesting information to go on. Try installing the libc6-dbg > package and see if you get more symbols. Just did that, see attachment. But its not that much more interesting than the other backtrace. It pretty much just confirms that its stuck in waitpid(). > In any case, the waitpid is correct; the child process hasn't died yet. > The question is, of course, why the child process is still alive and what > it's doing. Any idea on how to proceed from here? I have plenty of servers where i could investigate more :-) >> The 'vhost_list_modules' was used prior to the itk. It is not being used >> anymore. > > If you don't need it, try to take it out. done. >> The 'evasive20_module' is just loaded, not used. > > Others have reported issues with mod_evasive, although not of the kind you're > reporting. I guess you could take it out as well. done. I restarted the httpd after removing the modules so all G-state-processes on this server are gone for now. I'll have an eye one the server-status. regards volker Mit freundlichen Gr??en Volker Schwicking -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: gdb-output2.txt URL: From sgunderson at bigfoot.com Wed Oct 27 12:02:50 2010 From: sgunderson at bigfoot.com (Steinar H. Gunderson) Date: Wed, 27 Oct 2010 12:02:50 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <4CC7E32D.7060006@blafoo.org> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> <4CC6E42F.5010808@blafoo.org> <20101026181434.GK10784@uio.no> <4CC7E32D.7060006@blafoo.org> Message-ID: <20101027100250.GA13490@uio.no> On Wed, Oct 27, 2010 at 10:30:37AM +0200, volker wrote: >> which essentially says it's stuck in libc somewhere. Unfortunately, there's >> not much interesting information to go on. Try installing the libc6-dbg >> package and see if you get more symbols. > Just did that, see attachment. But its not that much more interesting > than the other backtrace. It pretty much just confirms that its stuck in > waitpid(). Ah. I think you are gdb-ing the wrong process. The point is to figure out what the _child_ process is doing, ie. the one it's waitpid-ing on, not the waitpid-er itself. >> In any case, the waitpid is correct; the child process hasn't died yet. >> The question is, of course, why the child process is still alive and what >> it's doing. > Any idea on how to proceed from here? I have plenty of servers where i > could investigate more :-) If you can reproduce it reliably with synthetic traffic, I guess we could set up a test server and I could go in and take a look? That's harder on a production system, of course... /* Steinar */ -- Homepage: http://www.sesse.net/ From roman at rs-labs.com Thu Oct 28 14:35:42 2010 From: roman at rs-labs.com (Roman Medina-Heigl Hernandez) Date: Thu, 28 Oct 2010 14:35:42 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <20101026181434.GK10784@uio.no> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> <4CC6E42F.5010808@blafoo.org> <20101026181434.GK10784@uio.no> Message-ID: <4CC96E1E.9020703@rs-labs.com> Steinar H. Gunderson escribi?: >> The 'evasive20_module' is just loaded, not used. > > Others have reported issues with mod_evasive, although not of the kind you're Yes, still unresolved. Mod_evasive is unsupported :/ It's a pitty. It would be nice if Steinar could have a look some day :) Cheers, -Roman From volker at blafoo.org Thu Oct 28 14:48:42 2010 From: volker at blafoo.org (volker) Date: Thu, 28 Oct 2010 14:48:42 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <4CC96E1E.9020703@rs-labs.com> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> <4CC6E42F.5010808@blafoo.org> <20101026181434.GK10784@uio.no> <4CC96E1E.9020703@rs-labs.com> Message-ID: <4CC9712A.7000200@blafoo.org> Hi, >>> The 'evasive20_module' is just loaded, not used. >> >> Others have reported issues with mod_evasive, although not of the kind you're > > Yes, still unresolved. Mod_evasive is unsupported :/ It's a pitty. It would > be nice if Steinar could have a look some day :) just fyi: im still on it. just have not got the time to investigate more yet. at the moment im writing a parser for the server-status so i can track which servers and domains are involved. regards volker From volker at blafoo.org Thu Oct 28 17:02:11 2010 From: volker at blafoo.org (volker) Date: Thu, 28 Oct 2010 17:02:11 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <20101027100250.GA13490@uio.no> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> <4CC6E42F.5010808@blafoo.org> <20101026181434.GK10784@uio.no> <4CC7E32D.7060006@blafoo.org> <20101027100250.GA13490@uio.no> Message-ID: <4CC99073.5070202@blafoo.org> hi, just squeezed in some minutes to write down this info. >> Just did that, see attachment. But its not that much more interesting >> than the other backtrace. It pretty much just confirms that its stuck in >> waitpid(). > > Ah. I think you are gdb-ing the wrong process. The point is to figure out > what the _child_ process is doing, ie. the one it's waitpid-ing on, not the > waitpid-er itself. yes, i just realized that myself. its far more useful to gdb the pid waitpid() is actually waiting for :-) >From the start: >From the scoreboard pid 12477 ist in G-state: ### 12477 0/50/280850 G 0.00 6002759 0 0.0 0.23 5564.76 67.195.115.246 www.anonymizeddomain.org GET /index.php?session=tdugmvc3h7vab0ra88l42nork0 HTTP/1.0 ### ### $ strace -p 12477 Process 12477 attached - interrupt to quit waitpid(16130, ^C ### The waitpid() we already know. Lets see what the pid 16130 its waiting for is doing: #### $ ps aux | grep 16130 nobody 16130 0.0 2.5 139476 107368 ? S Aug18 0:30 /usr/sbin/apache2 -f /etc/httpd/apache2-std.conf -k start ### Its another apache-Process which is running as nobody! In this case this is the correct user-context because the vhost is configured as: "AssignUserID nobody nobody". ### $ strace -p 16130 Process 16130 attached - interrupt to quit futex(0xb7ec8160, FUTEX_WAIT_PRIVATE, 2, NULL^C ### The gdb-output from pid 16130 ist attached as gdb-output3.txt. With me knowing pretty much nothing about backtraces it looks like its a php-process that has died/is stuck somehow/somewhere. I have investigated several pids like this. Heres a summary of what i gathered so far: - all processes in G-state are apache-root processes - all of the root-processes are waiting for an apache-child-process running in different user-context using the apache-itk-module and all are stuck in "futex(0xb7ec8160, FUTEX_WAIT_PRIVATE,..." - all child processes have at some point "libphp5.so" in their backtrace (at least the ones ive investigated so far) - many times the backtraces look the same if the request made to the apache is the same. (meaning: 5 processes in G-state with the same request-string produced the same backtrace) - the uptime of the httpd-process is not directly related to the number of processes in g-state. a host with 2 days (apache-)uptime does have the problem while a host with weeks of uptime does not. while im writing this, we're building a php (5.2.12 and 5.2.14) with debugging symbols. hopefully that will tell us a little more. the parser i mentioned already counts the number of processes that are in G-state for a single host. ill have to think about that a little more to make good use of the info that gives me. ill post an update tomorrow. regards volker -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: gdb-output3.txt URL: From sgunderson at bigfoot.com Thu Oct 28 17:33:33 2010 From: sgunderson at bigfoot.com (Steinar H. Gunderson) Date: Thu, 28 Oct 2010 17:33:33 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <4CC99073.5070202@blafoo.org> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> <4CC6E42F.5010808@blafoo.org> <20101026181434.GK10784@uio.no> <4CC7E32D.7060006@blafoo.org> <20101027100250.GA13490@uio.no> <4CC99073.5070202@blafoo.org> Message-ID: <20101028153333.GD24506@uio.no> On Thu, Oct 28, 2010 at 05:02:11PM +0200, volker wrote: > The gdb-output from pid 16130 ist attached as gdb-output3.txt. With me > knowing pretty much nothing about backtraces it looks like its a > php-process that has died/is stuck somehow/somewhere. You're right -- this looks a lot like something that's PHP's fault. However, it's a curious case; specifically, it seems to deadlock within free(). This should not happen unless either there is memory corruption somewhere, _or_ there are threads involved (fork() and threads is an incredibly evil combination). That, or perhaps shared memory. > 0xb7e63f0e in __lll_lock_wait_private () from /lib/libc.so.6 > (gdb) bt > #0 0xb7e63f0e in __lll_lock_wait_private () from /lib/libc.so.6 > #1 0xb7dfc52d in _L_lock_4578 () from /lib/libc.so.6 > #2 0xb7df97cd in free () from /lib/libc.so.6 > #3 0xb734568f in ?? () from /usr/lib/apache2/modules/libphp5.so > #4 0x0cd0b998 in ?? () FWIW, people have reported similar issues with prefork, although that is not with free() in the backtrace. It might be something like this, though: http://www.devcomments.com/lll_lock_wait_private-zend_hash_destroy-deadlock-at193631.htm Also mentioned is http://bugs.php.net/bug.php?id=31749&edit=1 which talks about set_time_limit(); I don't know if you're using that or not. A full backtrace would show if there's a signal handler in here or not, I guess; if there is, it's almost certainly the same PHP bug. /* Steinar */ -- Homepage: http://www.sesse.net/ From roman at rs-labs.com Thu Oct 28 18:33:47 2010 From: roman at rs-labs.com (Roman Medina-Heigl Hernandez) Date: Thu, 28 Oct 2010 18:33:47 +0200 Subject: [mpm-itk] apache-itk scoreboard-bug not really fixed? In-Reply-To: <4CC9712A.7000200@blafoo.org> References: <4CC6A30A.9060003@blafoo.org> <20101026103633.GB10784@uio.no> <4CC6B8A6.4020908@blafoo.org> <20101026115519.GE10784@uio.no> <4CC6E42F.5010808@blafoo.org> <20101026181434.GK10784@uio.no> <4CC96E1E.9020703@rs-labs.com> <4CC9712A.7000200@blafoo.org> Message-ID: <4CC9A5EB.2010300@rs-labs.com> :? Are you the new maintainer of mod_evasive? El 28/10/2010 14:48, volker escribi?: > Hi, > >>>> The 'evasive20_module' is just loaded, not used. >>> >>> Others have reported issues with mod_evasive, although not of the kind you're >> >> Yes, still unresolved. Mod_evasive is unsupported :/ It's a pitty. It would >> be nice if Steinar could have a look some day :) > > just fyi: im still on it. just have not got the time to investigate more > yet. at the moment im writing a parser for the server-status so i can > track which servers and domains are involved. > > regards > volker > > > _______________________________________________ > mpm-itk mailing list > mpm-itk at err.no > http://lists.err.no/mailman/listinfo/mpm-itk -- Saludos, -Roman PGP Fingerprint: 09BB EFCD 21ED 4E79 25FB 29E1 E47F 8A7D EAD5 6742 [Key ID: 0xEAD56742. Available at KeyServ]