0%

最近看到一个面试问题:多个服务器进程能否同时侦听一个TCP端口号?
有经验的同学应该遇到过这样的报错”Address already in use” ,所以我的答案是:IP不一样就能、IP一样就不能
但是看作者的答案是:默认(Default)情况下不可以,但是如果配置SO_REUSEADDR,是可以的。

翻阅多篇文章会发现SO_REUSEADDR、SO_REUSEPORT经常一起出现,这两个参数分别作用于服务端bind阶段和listen阶段。且Linux和Unix(FreeBSD)这两个选项有不一样的作用故写下这篇博客以作记录。

网络编程中一个TCP连接分为服务端和客户端,服务端需要四步,客户端需要两步。
服务端四步为:

  1. 调用socket函数,建立一个套接字
  2. 调用bind函数,将套接字绑定到一个IP+PORT地址(不执行也行,系统随机绑定端口)
  3. 调用listen函数,申请和初始化全连接队列和半连接队列,监听连接请求
  4. 调用accept函数,复制套接字处理请求

客户端两步为:

  1. 调用socket函数,建立一个套接字
  2. 调用connect函数使用该套接字与服务器进行连接

首先我们先来看下Linux环境下这两个参数的作用。

Linux下SO_REUSEADDR、SO_REUSEPORT选项作用

查看man 7 socket中SO_REUSEADDR、SO_REUSEPORT含义如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SO_REUSEADDR
Indicates that the rules used in validating addresses supplied in a bind(2) call should
allow reuse of local addresses. For AF_INET sockets this means that a socket may bind,
except when there is an active listening socket bound to the address. When the listen‐
ing socket is bound to INADDR_ANY with a specific port then it is not possible to bind
to this port for any local address. Argument is an integer boolean flag.

SO_REUSEPORT (since Linux 3.9)
Permits multiple AF_INET or AF_INET6 sockets to be bound to an identical socket ad‐
dress. This option must be set on each socket (including the first socket) prior to
calling bind(2) on the socket. To prevent port hijacking, all of the processes binding
to the same address must have the same effective UID. This option can be employed with
both TCP and UDP sockets.

For TCP sockets, this option allows accept(2) load distribution in a multi-threaded
server to be improved by using a distinct listener socket for each thread. This pro‐
vides improved load distribution as compared to traditional techniques such using a
single accept(2)ing thread that distributes connections, or having multiple threads
that compete to accept(2) from the same socket.

For UDP sockets, the use of this option can provide better distribution of incoming
datagrams to multiple processes (or threads) as compared to the traditional technique
of having multiple processes compete to receive datagrams on the same socket.

下面通过实验来验证上面的含义

Linux实验环境:

1
2
3
4
5
6
7
8
$ uname -a
Linux DESKTOP-XXXX 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

不同SO_REUSEADDR、SO_REUSEPORT值进程Bind Socket情况

脚本:

Go语言中net.Listen包括了socket创建、地址绑定、开启监听三个阶段不方便测试只bind不listen的情况,所以选择使用Python语言。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import sys, socket, time

addr, port = sys.argv[1], sys.argv[2]

# 创建一个TCP/IP socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 0) #修改此处
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 0) #修改此处

s.bind((addr, int(port)))
print(f'Bind to address {addr}:{port}...')
# bind后挂起进程
while True:
print("Sleep for 1 second...")
time.sleep(1)

结果

SocketA SocketB SO_REUSEADDR SO_REUSEPORT Result
172.22.147.210:8080 172.22.147.210:8080 1 1 Ok
0.0.0.0:8080 0.0.0.0:8080 1 1 Ok
172.22.147.210:8080 0.0.0.0:8080 1 1 Ok
172.22.147.210:8080 172.22.147.210:8080 1 0 OK
0.0.0.0:8080 0.0.0.0:8080 1 0 OK
172.22.147.210:8080 0.0.0.0:8080 1 0 OK
172.22.147.210:8080 172.22.147.210:8080 0 1 Ok
0.0.0.0:8080 0.0.0.0:8080 0 1 Ok
172.22.147.210:8080 0.0.0.0:8080 0 1 Ok
172.22.147.210:8080 172.22.147.210:8080 0 0 ADDR_ALREADY_IN_USE
0.0.0.0:8080 0.0.0.0:8080 0 0 ADDR_ALREADY_IN_USE
172.22.147.210:8080 0.0.0.0:8080 0 0 ADDR_ALREADY_IN_USE

注意:

  • SO_REUSEADDR、SO_REUSEPORT的1指socketA和socketB要同时设置为1
  • Result是指后一个socket bind时会不会失败。

不同SO_REUSEADDR、SO_REUSEPORT值进程Listen Socket情况

脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import sys, socket

addr, port = sys.argv[1], sys.argv[2]


with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 0) #修改此处
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 0) #修改此处

s.bind((addr, int(port)))
print(f'Bind to address {addr}:{port}...')
s.listen()
print(f'Listening to address {addr}:{port}...')
while True:
conn, addr = s.accept()
with conn:
print('Connected by', addr)
data = conn.recv(1024)
if not data:
break
conn.sendall(data)

结果

SocketA SocketB SO_REUSEADDR SO_REUSEPORT Result
172.22.147.210:8080 172.22.147.210:8080 1 1 Ok
0.0.0.0:8080 0.0.0.0:8080 1 1 Ok
172.22.147.210:8080 0.0.0.0:8080 1 1 Ok
172.22.147.210:8080 172.22.147.210:8080 1 0 ADDR_ALREADY_IN_USE
0.0.0.0:8080 0.0.0.0:8080 1 0 ADDR_ALREADY_IN_USE
172.22.147.210:8080 0.0.0.0:8080 1 0 ADDR_ALREADY_IN_USE
172.22.147.210:8080 172.22.147.210:8080 0 1 Ok
0.0.0.0:8080 0.0.0.0:8080 0 1 Ok
172.22.147.210:8080 0.0.0.0:8080 0 1 Ok
172.22.147.210:8080 172.22.147.210:8080 0 0 ADDR_ALREADY_IN_USE
0.0.0.0:8080 0.0.0.0:8080 0 0 ADDR_ALREADY_IN_USE
172.22.147.210:8080 0.0.0.0:8080 0 0 ADDR_ALREADY_IN_USE

FreeBSD下SO_REUSEADDR、SO_REUSEPORT选项作用

查看man setsockopt中SO_REUSEADDR、SO_REUSEPORT含义如下:

1
2
3
4
5
6
7
SO_REUSEADDR indicates that the rules used in validating addresses
supplied in a bind(2) system call should allow reuse of local addresses.

SO_REUSEPORT allows completely duplicate bindings by multiple processes
if they all set SO_REUSEPORT before binding the port. This option
permits multiple instances of a program to each receive UDP/IP multicast
or broadcast datagrams destined for the bound port.

下面通过实验来验证上面的含义

FreeBSD实验环境:

1
2
$ uname -a
FreeBSD freebsd 14.1-RELEASE FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64

不同SO_REUSEADDR、SO_REUSEPORT值进程Bind Socket情况

脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import sys, socket, time

addr, port = sys.argv[1], sys.argv[2]

# 创建一个TCP/IP socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 0) #修改此处
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 0) #修改此处

s.bind((addr, int(port)))
print(f'Bind to address {addr}:{port}...')
# bind后挂起进程
while True:
print("Sleep for 1 second...")
time.sleep(1)

结果

SocketA SocketB SO_REUSEADDR SO_REUSEPORT Result
172.22.147.210:8080 172.22.147.210:8080 1 1 Ok
0.0.0.0:8080 0.0.0.0:8080 1 1 Ok
172.22.147.210:8080 0.0.0.0:8080 1 1 Ok
172.22.147.210:8080 172.22.147.210:8080 1 0 ADDR_ALREADY_IN_USE
0.0.0.0:8080 0.0.0.0:8080 1 0 ADDR_ALREADY_IN_USE
172.22.147.210:8080 0.0.0.0:8080 1 0 Ok
172.22.147.210:8080 172.22.147.210:8080 0 1 Ok
0.0.0.0:8080 0.0.0.0:8080 0 1 Ok
172.22.147.210:8080 0.0.0.0:8080 0 1 Ok
172.22.147.210:8080 172.22.147.210:8080 0 0 ADDR_ALREADY_IN_USE
0.0.0.0:8080 0.0.0.0:8080 0 0 ADDR_ALREADY_IN_USE
172.22.147.210:8080 0.0.0.0:8080 0 0 ADDR_ALREADY_IN_USE

不同SO_REUSEADDR、SO_REUSEPORT值进程Listen Socket情况

脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import sys, socket

addr, port = sys.argv[1], sys.argv[2]


with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 0) #修改此处
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 0) #修改此处

s.bind((addr, int(port)))
print(f'Bind to address {addr}:{port}...')
s.listen()
print(f'Listening to address {addr}:{port}...')
while True:
conn, addr = s.accept()
with conn:
print('Connected by', addr)
data = conn.recv(1024)
if not data:
break
conn.sendall(data)

结果

SocketA SocketB SO_REUSEADDR SO_REUSEPORT Result
172.22.147.210:8080 172.22.147.210:8080 1 1 Ok
0.0.0.0:8080 0.0.0.0:8080 1 1 Ok
172.22.147.210:8080 0.0.0.0:8080 1 1 Ok
172.22.147.210:8080 172.22.147.210:8080 1 0 ADDR_ALREADY_IN_USE
0.0.0.0:8080 0.0.0.0:8080 1 0 ADDR_ALREADY_IN_USE
172.22.147.210:8080 0.0.0.0:8080 1 0 OK
172.22.147.210:8080 172.22.147.210:8080 0 1 Ok
0.0.0.0:8080 0.0.0.0:8080 0 1 Ok
172.22.147.210:8080 0.0.0.0:8080 0 1 Ok
172.22.147.210:8080 172.22.147.210:8080 0 0 ADDR_ALREADY_IN_USE
0.0.0.0:8080 0.0.0.0:8080 0 0 ADDR_ALREADY_IN_USE
172.22.147.210:8080 0.0.0.0:8080 0 0 ADDR_ALREADY_IN_USE

结论

Linux:

  • 设置SO_REUSEADDR可以使多个进程bind到同一个ip+port
  • 设置SO_REUSEPORT可以使多个进程listen同一个ip+port,内核做负载均衡分配到具体进程。

FreeBSD:

  • 设置SO_REUSEADDR 则表示0.0.0.0、172.22.147.210是不同的ip地址,可以同时listen 0.0.0.0:port和单个ip:port,但是不能listen同一个ip+port
  • 设置SO_REUSEPORT可以使多个进程listen同一个ip+port

参考:

深入理解Linux端口重用这一特性
一个进程绑定了端口号后,创建子进程(fork),子进程是不是和父进程绑定了同一个端口号?
TCP协议细节系列(9):深入解析Linux下so_reuseaddr和so_reuseport选项

作为一个Django开发时常与uWSGI打交道,接触的项目也都是采用nginx+uWSGI+django的部署方式,此篇文章记录下对WSGI的学习。

什么是WSGI

WSGI,全称为“Web Server Gateway Interface”,即 Web 服务器网关接口。它是一个 Python 标准,定义了 Web 服务器与 Web 应用或框架之间的通信协议(规范)。
容易混淆的概念WSGI、uwsgi、uWSGI对比:

  • WSGI: Web服务器与Web框架间的通信协议(规范)
  • uwsgi:uWSGI服务器自有的协议,它用于定义传输信息的类型,每一个uwsgi packet前4byte为传输信息类型描述,用于与nginx等代理服务器通信,它与WSGI相比是两样东西。
  • uWSGI:uWSGI是实现了uwsgi和WSGI两种协议的Web服务器

alt text
上面可以看出WSGI、uwsgi是通信协议或者叫做规范,uWSGI是Web服务器

现实WSGI服务器demo

为了简化对WSGI的理解这里我们先看一个HTTP到WSGI服务器的实现。
参考Python源码中WSGI实现方式的示例lib/wsgiref/simeple_server.py
alt text

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import sys
import socket
from datetime import datetime, timezone


class WSGIServer():
application = None

def __init__(self, host, port) -> None:
self.client_connection = None
self.headers_set = None
self.server_address = (host, port)
self.request_data = None
self.request_method = None
self.path = None
self.server_name = None
self.server_port = None

def get_environ(self):
env = {}
env['wsgi.version'] = (1, 0)
env['wsgi.url_scheme'] = 'http'
env['wsgi.input'] = self.request_data
env['wsgi.errors'] = sys.stderr
env['REQUEST_METHOD'] = self.request_method
env['PATH_INFO'] = self.path
env['SERVER_NAME'] = self.server_name
env['SERVER_PORT'] = self.server_port
return env

def handle_one_request(self):
sk = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sk.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sk.bind(self.server_address)
sk.listen(1)
self.client_connection, client_address = sk.accept()
request_data = self.request_data = self.client_connection.recv(1024)
self.request_method, self.path, self.request_version = self.parse_request(request_data) #提取http协议请求头字段
env = self.get_environ()
result = self.application(env, self.start_response) #调用application
self.finish_response(result) #将python对象转换为http协议二进制数据使用socket发送
self.client_connection.close()

def start_response(self, status, response_headers, exc_info=None):
now = datetime.now(timezone.utc).strftime("%a, %d %b %Y %H:%M:%S GMT")
server_headers = [('Date', now), ('Server', 'WSGIServerCustomer 0.1')]
self.headers_set = [status, response_headers+server_headers]

def finish_response(self, result):
try:
status, response_headers = self.headers_set
response = f'{self.request_version} {status}\r\n'
for header in response_headers:
response += '{0}: {1}\r\n'.format(*header)
response += '\r\n'
for data in result:
response += data.decode()
print(''.join('<{line}\n'.format(line=line) for line in response.splitlines()))
self.client_connection.sendall(response.encode())
finally:
self.client_connection.close()

def parse_request(self, data):
print(''.join('>{line}\n'.format(line=line.decode()) for line in data.splitlines()))
data = data.splitlines()[0]
return data.decode().split()

def get_app(self):
return self.application

def set_app(self,application):
self.application = application

# 实现一个application
def demo_app(environ, start_response):
from io import StringIO
stdout = StringIO()
print("Hello world!", file=stdout)
print(file=stdout)
# h = sorted(environ.items())
# for k,v in h:
# print(k,'=',repr(v), file=stdout)
start_response("200 OK", [('Content-Type','text/plain; charset=utf-8')])
return [stdout.getvalue().encode("utf-8")]


def make_server(host, port, app):
server = WSGIServer(host, port)
server.set_app(app)
return server


if __name__ == '__main__':
httpd = make_server('', 8000, demo_app)
# 只处理一次请求
httpd.handle_one_request()

运行效果展示

alt text

实现 WSGI服务器 application分离

下面我们将WSGI服务器与Application分离,Application使用Flask.wsgi_app
alt text

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# !/usr/bin/env python
# -*- coding: utf-8 -*-
# @Author : neighbour7
# @File : wsgi_server.py

import sys
import socket
from datetime import datetime, timezone


class WSGIServer():
application = None

def __init__(self, host, port) -> None:
self.client_connection = None
self.headers_set = None
self.server_address = (host, port)
self.request_data = None
self.request_method = None
self.path = None
self.server_name = None
self.server_port = None

def get_environ(self):
env = {}
env['wsgi.version'] = (1, 0)
env['wsgi.url_scheme'] = 'http'
env['wsgi.input'] = self.request_data
env['wsgi.errors'] = sys.stderr
env['REQUEST_METHOD'] = self.request_method
env['PATH_INFO'] = self.path
env['SERVER_NAME'] = self.server_name
env['SERVER_PORT'] = self.server_port
return env

def handle_one_request(self):
sk = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sk.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sk.bind(self.server_address)
sk.listen(1)
self.client_connection, client_address = sk.accept()
request_data = self.request_data = self.client_connection.recv(1024)
self.request_method, self.path, self.request_version = self.parse_request(request_data) #提取http协议请求头字段
env = self.get_environ()
result = self.application(env, self.start_response) #调用application
self.finish_response(result) #将python对象转换为http协议二进制数据使用socket发送
self.client_connection.close()

def start_response(self, status, response_headers, exc_info=None):
now = datetime.now(timezone.utc).strftime("%a, %d %b %Y %H:%M:%S GMT")
server_headers = [('Date', now), ('Server', 'WSGIServerCustomer 0.1')]
self.headers_set = [status, response_headers+server_headers]

def finish_response(self, result):
try:
status, response_headers = self.headers_set
response = f'{self.request_version} {status}\r\n'
for header in response_headers:
response += '{0}: {1}\r\n'.format(*header)
response += '\r\n'
for data in result:
response += data.decode()
print(''.join('>{line}\n'.format(line=line) for line in response.splitlines()))
self.client_connection.sendall(response.encode())
finally:
self.client_connection.close()

def parse_request(self, data):
data = data.splitlines()[0]
return data.decode().split()

def get_app(self):
return self.application

def set_app(self,application):
self.application = application

# 实现一个application
def demo_app(environ, start_response):
from io import StringIO
stdout = StringIO()
print("Hello world!", file=stdout)
print(file=stdout)
h = sorted(environ.items())
for k,v in h:
print(k,'=',repr(v), file=stdout)
start_response("200 OK", [('Content-Type','text/plain; charset=utf-8')])
return [stdout.getvalue().encode("utf-8")]


def make_server(host, port, app):
server = WSGIServer(host, port)
server.set_app(app)
return server


if __name__ == '__main__':
httpd = make_server('', 8000, demo_app)
# 只处理一次请求
httpd.handle_one_request()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# !/usr/bin/env python
# -*- coding: utf-8 -*-
# @Author : neighbour7
# @File : flaskapp.py

from flask import Flask, Response

flask_app = Flask('flaskapp')


@flask_app.route('/hello')
def hello_world():

return "Hello World!"

app = flask_app.wsgi_app

Flask.wsgi_app 中接收两个参数(environ, start_response),源码如下:
alt text

运行效果展示
alt text

总结

通过上面的例子明白WSGI实际上就是将HTTP(FastCGI、uwsgi)等协议转化为Python的可执行对象Application
WSGI 抽象了底层的网络通信细节,让开发者能够专注于编写 Web 应用逻辑,而不必处理网络层面的复杂性。

参考:

自己动手开发网络服务器(二):实现WSGI服务
WSGI,uWSGI和uwsgi区别详解
wsgiref/simple_server.py