Drupal Load Balancer Comment Spam Problem

webadmin's picture
Drupal
Load Balancer

This week, Our office for the fist time going to migration Our Old infrastructure Web server topology, upgraded to Load Balancing infrastructure model.

One week we have upgrade our system !... We have 7 Machine HP G7,.. IDC IIX. web server is behind a load balancer, and interacts with the load balancer through private internal IPs VPN NAS and Database Server.
We only have 1 Administrator and 1 Webmaster ( as me ) and 7 Machine we have to migrate and configure.. this is hard job.. !! i sacrifice my time to meet with my family .. my wife and my son.. i am sorry for Rakes... I am sorry for Mami.. :(( .. i will pay back my time for you... i love you.!



As a result, the public IP addresses of the client is replaced with the private IP of the load balancer, which directs HTTP/HTTPS traffic to the web server. In other words, the php variable $_SERVER['REMOTE_ADDR'] always comes through as the private IP of the load balancer.

This is first meet our problems... !
All Comment Spam come in to database like crazy.. because that Internal Drupal Core read only by one PROXY IP Balancer.. so after we search and search i found this tips, because we want to see real IP Visitors at desktop administrator right.. :)

This is a solutions, I inspected other server variables in $_SERVER and found one called HTTP_X_FORWARDED_FOR containing the public IP address of our machine.

At you /drupalcoredir/sites/default/settings.php find this code to change !

I think you who read this articles as a Drupal web developer right.. so you will know what you have to do.. hehe :p

<php>
<?php
$conf['cookie_domain'] = '.mydomain.com';
$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_addresses'] = array('aaa.bbb.ccc.ddd');
?>
</php>

If you have multiple load balancer !

<php>
<?php
$conf['cookie_domain'] = '.mydomain.com';
$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_addresses'] = array('aaa.bbb.ccc.ddd', 'eee.fff.ggg.hhh');
?>
</php>

if you want it make easy you can use this module http://drupal.org/project/smart_ip

Next copy paste article from
http://drupal.org/node/425990

When running large Drupal installations, you may find yourself with a web server cluster that lives behind a load balancer. The pages here contain tips for configuring Drupal in this setup, as well as example configurations for various load balancers.

In addition to a large selection of commercial options, various open source load balancers exist: Pound, Varnish, ffproxy, tinyproxy, etc. Web servers (including Apache and NGINX) can also be configured as reverse proxies.

The basic layout you can expect in most high-availability environments will look something like this:


                                                                         ┌─→ Web server 1 ↘
Browser ──→ HTTP Reverse Proxy  ──┼─→ Web server 2 → Database
                                                                         └─→ Web server 3 ↗


By way of explanation:

    Browsers will connect to a reverse proxy using HTTP or HTTPS. The proxy will in turn connect to web servers via HTTP.
    Web servers will likely be on private IP addresses. Use of a private network allows web servers to share a database and/or NFS server that need not be exposed to the Internet on a public IP address.
    If HTTPS is required, it is configured on the proxy, not the web server.

Most HTTP reverse proxies will also "clean" requests in some way. For example, they'll require that a browser include a valid User-Agent string, or that the requested URL contain standard characters or not exceed a certain length.

In the case of Drupal, it is highly recommended that all web servers share identical copies of the Drupal DocumentRoot in use, to insure version consistency between themes and modules. This may be achieved using an NFS mount to hold your Drupal files, or by using a revision control system (CVS, SVN, git, etc) to maintain your files.
High availability

In order to achieve the maximum uptime, a high-availability design should have no single points of failure. For network connectivity, this may mean using BGP with multiple upstream providers, as well as perhaps using Link Aggregation (LACP) to maintain multiple physical network paths in your LAN. In the diagram above, the two server elements that need attention are the load balancer and the database.

A load balancer cannot easily be "clustered" because a single IP address usually needs to apply to a single machine. To address this issue, you may wish to read up on CARP (FreeBSD) and Heartbeat (Linux).

A database server generally needs access to a single repository of data. Various technologies exist to address this, including MySQL NDB and PgCluster. If you're willing to accept the possibility of less than 100% up-time while you recover from broken hardware, you should consider using transactional database replication to keep a live copy of your data on a secondary server. Read the documentation for your database server software to find out how to set this up.

Needless to say, always set up regular automated backups.
Note:

    If you plan to install Drupal 7 on a web server that browsers will reach only via HTTPS, there's an outstanding issue you'll want to check (#313145: Support X-Forwarded-Proto HTTP header). At this time, Drupal's AJAX callbacks use URLs based on the protocol used at the web server, regardless of the protocol used at the proxy. Your workaround is either this patch, or to set the "reverse_proxy" variable manually in your settings.php file. Unfortunately, as the Drupal installer relies on AJAX, your only other option is to install via HTTP instead of HTTPS.