Golang Queue 的一个优化实现

今天看 Caddy 源码,IDE 源码一层层点点下去,最后发现 net/http/transport.go 里有个 wantConnQueue 的实现很有意思。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// A wantConnQueue is a queue of wantConns.
type wantConnQueue struct {
// This is a queue, not a deque.
// It is split into two stages - head[headPos:] and tail.
// popFront is trivial (headPos++) on the first stage, and
// pushBack is trivial (append) on the second stage.
// If the first stage is empty, popFront can swap the
// first and second stages to remedy the situation.
//
// This two-stage split is analogous to the use of two lists
// in Okasaki's purely functional queue but without the
// overhead of reversing the list when swapping stages.
head []*wantConn
headPos int
tail []*wantConn
}

实现

一般来说,一个队列主要就是 Push、Pop、Len 这三个方法,实现如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
type CommonQueue struct {
data []int
}

func (q *CommonQueue) Push(v int) {
q.data = append(q.data, v)
}

func (q *CommonQueue) Pop() int {
if len(q.data) == 0 {
panic("queue is empty")
}
w := q.data[0]
q.data = q.data[1:]
return w
}

func (q *CommonQueue) Len() int {
return len(q.data)
}

但是这个 wantConnQueue 的实现大致是这样的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
type Queue struct {
head []int
headPos int
tail []int
}

func (q *Queue) Push(v int) {
q.tail = append(q.tail, v)
}

func (q *Queue) Pop() int {
if q.headPos >= len(q.head) {
if len(q.tail) == 0 {
panic("queue is empty")
}
q.head, q.tail, q.headPos = q.tail, q.head[:0], 0
}
w := q.head[q.headPos]
q.headPos++
return w
}

func (q *Queue) Len() int {
return len(q.head) - q.headPos + len(q.tail)
}

每次 Push 的时候都将数据 append 到 tail 数组,每次取数据的时候,优先从 head 中取,如果 head 的数据取完了,则再从 tail 中取(交换了 head 和 tail 的值)。这里注意,重置 tail 用的是 head[:0] ,也就是说,slice 底层的数组是可以复用的。

压测

这里模拟了以下的场景进行压测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
func BenchmarkQueue(b *testing.B) {
q := &Queue{}
for i := 0; i < b.N; i++ {
q.Push(1)
q.Push(1)
q.Pop()
}
}

func BenchmarkCommonQueue(b *testing.B) {
q := &CommonQueue{}
for i := 0; i < b.N; i++ {
q.Push(1)
q.Push(1)
q.Pop()
}
}

结果如下

1
go test -run=. -bench=. -benchtime=5s -count 5 -benchmem -cpuprofile=cpu.out -memprofile=mem.out -trace=trace.out . | tee bench.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
goos: darwin
goarch: amd64
pkg: go-play/queue-plus
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkQueue-12 283320342 19.30 ns/op 64 B/op 0 allocs/op
BenchmarkQueue-12 679015843 16.75 ns/op 52 B/op 0 allocs/op
BenchmarkQueue-12 693980847 13.36 ns/op 55 B/op 0 allocs/op
BenchmarkQueue-12 762396834 12.32 ns/op 55 B/op 0 allocs/op
BenchmarkQueue-12 534547828 15.48 ns/op 66 B/op 0 allocs/op
BenchmarkCommonQueue-12 504237292 10.78 ns/op 85 B/op 0 allocs/op
BenchmarkCommonQueue-12 550798738 10.87 ns/op 87 B/op 0 allocs/op
BenchmarkCommonQueue-12 521881972 10.85 ns/op 82 B/op 0 allocs/op
BenchmarkCommonQueue-12 542052586 13.51 ns/op 89 B/op 0 allocs/op
BenchmarkCommonQueue-12 529258514 11.55 ns/op 81 B/op 0 allocs/op
PASS
ok go-play/queue-plus 96.848s

实话说,看不出多大区别。

再来试试随机 Push 和 Pop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
func init() {
rand.Seed(time.Now().UnixNano())
}

func BenchmarkQueue(b *testing.B) {
q := &Queue{}
for i := 0; i < b.N; i++ {
if v := rand.Intn(2); v == 0 && q.Len() > 0 {
q.Pop()
} else {
q.Push(1)
}
}
}

func BenchmarkCommonQueue(b *testing.B) {
q := &CommonQueue{}
for i := 0; i < b.N; i++ {
if v := rand.Intn(2); v == 0 && q.Len() > 0 {
q.Pop()
} else {
q.Push(1)
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
goos: darwin
goarch: amd64
pkg: go-play/queue-plus
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkQueue-12 342283135 17.26 ns/op 0 B/op 0 allocs/op
BenchmarkQueue-12 343866208 17.27 ns/op 0 B/op 0 allocs/op
BenchmarkQueue-12 344301460 17.87 ns/op 0 B/op 0 allocs/op
BenchmarkQueue-12 332799319 18.10 ns/op 0 B/op 0 allocs/op
BenchmarkQueue-12 325960773 18.07 ns/op 0 B/op 0 allocs/op
BenchmarkCommonQueue-12 297724114 20.44 ns/op 14 B/op 0 allocs/op
BenchmarkCommonQueue-12 297911770 20.23 ns/op 15 B/op 0 allocs/op
BenchmarkCommonQueue-12 296037948 20.72 ns/op 17 B/op 0 allocs/op
BenchmarkCommonQueue-12 293979505 20.52 ns/op 16 B/op 0 allocs/op
BenchmarkCommonQueue-12 296939342 20.53 ns/op 14 B/op 0 allocs/op
PASS
ok go-play/queue-plus 79.709s

额,差别好像也不大,就是内存用的少一些。