WEB服务器 - Apache、Nnginx、Lighttpd的比较和择优 -极品分享

WEB服务器 - Apache、Nnginx、Lighttpd的比较和择优

1. Apache服务器和nginx的优缺点：
我们之前大量使用Apache来作为HTTPServer。 Apache具有很优秀的性能，而且通过模块可以提供各种丰富的功能。
1) 首先Apache对客户端的响应是支持并发的 ，运行httpd这个daemon进程之后，它会同时产生多个孩子进程/线程，每个孩子进程/线程分别对客户端的请求进行响应；
2) 另外，Apache可以提供静态和动态的服务 ，例如对于PHP的解析不是通过性能较差的CGI实现的而是通过支持PHP的模块来实现的(通常为mod_php5，或者叫做apxs2)。
3) 缺点:
因此通常称为Apache的这种Server为process-based server ，也就是基于多进程的HTTPServer，因为它需要对每个用户请求创建一个孩子进程/线程进行响应；
这样的缺点是，如果并发的请求非常多(这在大型门户网站是很常见的)就会需要非常多的线程，从而占用极多的系统资源CPU和内存。因此对于并发处理不是Apache的强项。
4)解决方法：
目前来说出现了另一种WebServer，在并发方面表现更加优越，叫做asynchronous servers异步服务器。最有名的为Nginx和Lighttpd。所谓的异步服务器是事件驱动程序模式的event-driven，除了用户的并发请求通常只需要一个单一的或者几个线程。因此占用系统资源就非常少。这几种又被称为lightweight web server。
举例，对于10,000的并发连接请求，nginx可能仅仅使用几M的内存；而Apache可能需要使用几百M的内存资源。

2. 实际中单一的使用：
1)关于单一使用Apache来作为HTTPServer的情况我们不用再多做介绍，非常常见的应用；
上面我们介绍到Apache对于PHP等服务器端脚本的支持是通过自己的模块来实现的，而且性能优越。
2)我们同样可以单单使用 nginx或者lighttpd来作为HTTPServer来使用。
nginx和lighttpd和Apache类似都通过各种模块可以对服务器的功能进行丰富的扩展，同样都是通过conf配置文件对各种选项进行配置。
对于PHP等，nginx和lighttpd都没有内置的模块来对PHP进行支持，而是通过FastCGI来支持的。
Lighttpd 通过模块可以提供CGI, FastCGI和SCGI等服务，Lighttpd is capable of automatically spawning FastCGI backends as well as using externally spawned processes.
nginx则没有自己提供处理PHP的功能，需要通过第三方的模块来提供对PHP进行FastCGI方式的集成。
nginx has module support for FastCGI via a built-in module, SCGI and WSGI via 3rd Party module. The user must be able to spawn the processes separately because nginx is not able to automatically spawn them [9]. nginx does not support normal CGI applications [10], which is actually a security benefit.

Lighttpd vs nginx ：http://www.wikivs.com/wiki/Lighttpd_vs_nginx

3.反向代理Reverse Proxy：
0) 代理服务器的概念proxy server:

代理服务器 的概念很容易理解，就是通常作为两台机器中间的机器，需要提供的功能往往有：

缓存caching，安全, 负载均衡load banlancing。

所谓的负载均衡就是，很多机器使用一个代理的时候，代理服务器需要对各个服务器进行均衡。
我们常见的代理是正向的代理，例如我们机房有20台电脑要上网，现在只有一个电脑可以上网，那么可以使用这台电脑作为代理服务器，所有通过网络的数据传输都要经过该代理服务器。

而反向代理，是和正向代理相反的 ，正向代理针对服务接收方用户来说，反向代理或者叫做服务器端代理是针对服务器端的，意思是有多台服务器，反向代理服务器对用户的请求代理发送给其中的一台服务器进行处理。

Proxy server ：http://en.wikipedia.org/wiki/Proxy_server

1) 实际中对于一个大型网站，我们通常使用很多台sever来构成一个cluster来对用户的各种请求进行响应。
因此通常需要一台或者多台反向代理服务器来对多台Server进行服务。
这个反向代理服务器需要提供的功能一般都包括：
安全方面；缓存压缩功能；负载均衡功能；

Reverse proxy ：http://en.wikipedia.org/wiki/Reverse_proxy

(需要注意反向代理服务器和防火墙优点类似，但是防火墙一般只有安全方面的考虑，没有缓存和负载均衡方面的功能。)

3) 综上，实际中Web服务器端的架构
通常是多台Web服务器运行并行地提供服务；同时还需要在Web服务器前段部署一台或者多台反向代理服务器，一方面缓存一些静态数据，或者将Web服务器动态产生的一些内容缓存，另一方面通过负载均衡功能，可以均匀地将用户的并发请求传递给多台Web服务器进行处理。
这样一方面可以大大降低后面每台Web服务器的负担；另一方面可以实现多台服务器的负载均衡。

4. 实际中使用nginx或者lighttpd当做反向代理服务器，后台布置多台ApacheHTTPServer：
1)上面说到，nginx和lighttpd的优点在于速度快，轻量级，在处理多用户并发方面要大大优于Apache服务器。
因此我们通常可以把他们作为反向代理服务器放置到多台的Apache Web服务器前段，来一方面缓存数据，另一方面实现多台服务器的负载均衡。
2) 当然了Apache本身通过mod_proxy和mod_cache也可以实现反向代理和缓存功能 ，但是在处理高并发方面还是无法与nginx和lighttpd这种轻量的异步模式的服务器来比较。
3)另外，利用nginx和lighttpd的反响代理功能，我们可以通过设置其configuration文件，当客户端请求的是静态内容(例如一些图片,js,html文件等)的话，直接由nginx或者 lighttpd进行响应；
如果需要访问动态内容(通常需要实时从数据库中读取)的话，则通过反向代理，nginx等可以将请求发送给后台等待的Apache进行响应，然后Apache将相应的结果返回给nginx，后者再响应用户的时候还可以进行缓存。
4)有时候还可以使用一些缓存的工具，例如Squid。
另外nginx也提供了对一些缓存功能的支持，例如memcache 等。

5)因此如果从图形来分析的话，通常的架构如下：

nginx作为最前端的web cache系统

这个结构的优点：

1、可以使用nginx前端进行诸多复杂的配置，这些配置从前在squid是没法做或者做起来比较麻烦的，比如针对目录的防盗链。

2、nginx前端可以直接转发部分不需要缓存的请求。

3、因为nginx效率高于squid，所以某些情况下可以利用nginx的缓存来减轻squid压力。

4、可以实现url hash等分配策略。

5、可以在最前端开启gzip压缩，这样后面的squid缓存的纯粹是无压缩文档，可以避免很多无谓的穿透。

6、因为nginx稳定性比较高，所以lvs不需要经常调整，通过nginx调整就可以。

7、squid的文件打开数按默认的1024就绰绰有余，不过处理的请求可一个都不会少。

8、可以启用nginx的日志功能取代squid，这样做实时点击量统计时可以精确定位到url，不必要再用低效率的grep来过滤。

9、因为nginx的负载能力高于squid，所以在用lvs分流时可以不必分得特别均衡，出现单点故障的几率比较低。

nginx和squid配合搭建的web服务器前端系统

前端的lvs和squid，按照安装方法，把epoll打开，配置文件照搬，基本上问题不多。

这个架构和app_squid架构的区别，也是关键点就是：加入了一级中层代理，中层代理的好处实在太多了：

1、gzip压缩

压缩可以通过nginx做，这样，后台应用服务器不管是apache、resin、lighttpd甚至iis或其他古怪服务器，都不用考虑压缩的功能问题。

2、负载均衡和故障屏蔽

nginx可以作为负载均衡代理使用，并有故障屏蔽功能，这样，根据目录甚至一个正则表达式来制定负载均衡策略变成了小case。

3、方便的运维管理，在各种情况下可以灵活制订方案。

例如，如果有人用轻量级的ddos穿透squid进行攻击，可以在中层代理想办法处理掉；访问量和后台负载突变时，可以随时把一个域名或一个目录的请求扔入二级cache服务器；可以很容易地控制no-cache和expires等header。等等功能。。。

4、权限清晰

这台机器就是不写程序的维护人员负责，程序员一般不需要管理这台机器，这样假如出现故障，很容易能找到正确的人。

对于应用服务器和数据库服务器，最好是从维护人员的视线中消失，我的目标是，这些服务只要能跑得起来就可以了，其它的事情全部可以在外部处理掉。

General Architecture of LVS Clusters

For transparency, scalability, availability and manageability of the whole system, we usually adopt three-tie architecture in LVS clusters illustrated in the following figure.

The three-tie architecture consists of

Load Balancer , which is the front-end machine of the whole cluster systems, and balances requests from clients among a set of servers, so that the clients consider that all the services is from a single IP address.
Server Cluster , which is a set of servers running actual network services, such as Web, Mail, FTP, DNS and Media service.
Shared Storage , which provides a shared storage space for the servers, so that it is easy for the servers to have the same contents and provide the same services.

Load balancer is the single entry-point of server cluster systems, it can run IPVS that implements IP load balancing techniques inside the Linux kernel, or KTCPVS that implements application-level load balancing inside the Linux kernel. When IPVS is used, all the servers are required to provide the same services and contents, the load balancer forward a new client request to a server according to the specified scheduling algorithms and the load of each server. No matter which server is selected, the client should get the same result. When KTCPVS is used, servers can have different contents, the load balancer can forward a request to a different server according to the content of request. Since KTCPVS is implemented inside the Linux kernel, the overhead of relaying data is minimal, so that it can still have high throughput.

The node number of server cluster can be changed according to the load that system receives. When all the servers are overloaded, more new servers can be added to handle increasing workload. For most Internet services such as web, the requests are usually not highly related, and can be run parallely on different servers. Therefore, as the node number of server cluster increases, the performance of the whole can almost be scaled up linearly.

Shared storage can be database systems, network file systems, or distributed file systems. The data that server nodes need to update dynamically should be stored in data based systems, when server nodes read or write data in database systems parallely, database systems can guarantee the consistency of concurrent data access. The static data is usually kept in network file systems such as NFS and CIFS, so that data can be shared by all the server nodes. However, the scalability of single network file system is limited, for example, a single NFS/CIFS can only support data access from 4 to 8 servers. For large-scale cluster systems, distributed/cluster file systems can be used for shared storage, such as GPFS, Coda and GFS, then shared storage can be scaled up according to system requirement too.

Load balancer, server cluster and shared storage are usually connected by high-speed networks, such as 100Mbps Ethernet network and Gigabit Ethernet network, so that the network will not become the bottleneck of system when the system grows up.

来源：

生产环境中的一些web server（主要是三巨头apache, nginx, lighttpd）：http://hudeyong926.javaeye.com/blog/813141
nginx作为最前端的web cache系统：http://sudone.com/archie/app-nginx-squid-nginx.html
nginx和squid配合搭建的web服务器前端系统：http://sudone.com/archie/app_nginx_squid.html
General Architecture of LVS Clusters：http://www.linuxvirtualserver.org/architecture.html

极品分享

WEB服务器 - Apache、Nnginx、Lighttpd的比较和择优

评论回复