【GoLang笔记】实例分析GoLang built

备注1：本文旨在介绍Go语言中map这个内置数据结构的引用行为，并用实例来说明如何避免这种引用行为带来的“副作用”。备注2：文末列出的参考资料均来自GoLang.org官方文档，需翻墙访问。

1. map internalsmap是go中内置的数据结构，关于其语法规则，可以查看language specification中的说明，或者查看Effective Go中关于，此处略过。map的底层是用hashmap实现的（底层hashmap源码路径为src/pkg/runtime/hashmap.c），部分注释摘出如下：

// This file contains the implementation of Go's map type.//// The map is just a hash table. The data is arranged// into an array of buckets. Each bucket contains up to// 8 key/value pairs. The low-order bits of the hash are// used to select a bucket. Each bucket contains a few// high-order bits of each hash to distinguish the entries// within a single bucket.//// If more than 8 keys hash to a bucket, we chain on// extra buckets.//// When the hashtable grows, we allocate a new array// of buckets twice as big. Buckets are incrementally// copied from the old bucket array to the new bucket array.这段注释除表明map底层确实是hashmap实现的外，还解释了hashmap部分实现细节。此外，源码中还包含遍历map的处理细节以及一个map性能的小实验，可以查看源码文件了解。目前已经清楚，map在内部维护了一个hashmap，那么语法层面的map数据结构是如何与底层的hashmap关联起来的呢？在Effective Go关于Maps的说明文档中，有这样一句话：Like slices, maps hold references to an underlying data structure. If you pass a map to a function that changes thecontents of the map, the changes will be visible in the caller.具体而言，map这个数据结构在内部维护了一个指针，该指针指向一个真正存放数据的hashmap。参考Go官网博客的文章Go Slices: usageand internals关于slice内部结构的说明，再结合map底层hashmap.c源码片段（注意下面摘出的Hmap结构体定义中的count和buckets字段，而oldbuckets只在map rehash时有用），可以看出map内部确实维护着map元素的count和指向hashmap的指针。struct Hmap {// Note: the format of the Hmap is encoded in ../../cmd/gc/reflect.c and// ../reflect/type.go. Don't change this structure without also changing that code!uintgo count;// # live cells == size of map. Must be first (used by len() builtin)uint32 flags;uint32 hash0;// hash seeduint8 B;// log_2 of # of buckets (can hold up to LOAD * 2^B items)uint8 keysize;// key size in bytesuint8 valuesize; // value size in bytesuint16 bucketsize; // bucket size in bytesbyte *buckets;// array of 2^B Buckets. may be nil if count==0.byte *oldbuckets; // previous bucket array of half the size, non-nil only when growinguintptr nevacuate; // progress counter for evacuation (buckets less than this have been evacuated)};2. map type is reference type先看下面一段简单代码：// demo.gopackage mainimport "fmt"func main() {foo := make(map[string]string)foo["foo"] = "foo_v"bar := foobar["bar"] = "bar_v"fmt.Printf("foo=%v, ptr_foo=%v\n", foo, &foo)fmt.Printf("bar=%v, ptr_bar=%v\n", bar, &bar)}编译并执行：$ go build demo.go$ ./demo输出结果如下：foo=map[foo:foo_v bar:bar_v], ptr_foo=0xc210000018bar=map[foo:foo_v bar:bar_v], ptr_bar=0xc210000020看到了吧？foo和bar的地址不同，但它们的内容是相同的。当我们执行bar := foo时，bar被自动声明为map[string][string]类型并进行赋值，而这个赋值行为并没有为bar申请一个新的hashmap并把foo底层的hashmap内容copy过去，它只是把foo指向底层hashmap的指针copy给了bar，，赋值后，它们指向同一个底层hashmap。这个行为类似于C++中的“浅拷贝”。可见，正如Go maps in action一文中提到的，Go的map类型是引用类型（Map types arereference types）。关于Go语言的设计者们为何要把map设计成reference type，可以参考Go FAQ在的解释。新手需要特别注意这种引用行为，下面开始用实例来说明。

3. handel "deep copy" manually if necessary有时候，业务场景并不希望两个map变量指向同一个底层hashmap，但若Go新手恰好对map的引用行为理解不深的话，很有可能踩到坑，我们来段有Bug的代码感受下。

// bug version: mapref.gopackage mainimport ("fmt")func main() {foo := make(map[string]map[string]map[string]float32)foo_s12 := map[string]float32{"s2": 0.1}foo_s1 := map[string]map[string]float32{"s1": foo_s12}foo["x1"] = foo_s1foo_s22 := map[string]float32{"s2": 0.5}foo_s2 := map[string]map[string]float32{"s1": foo_s22}foo["x2"] = foo_s2x3 := make(map[string]map[string]float32)for _, v := range foo {for sk, sv := range v {if _, ok := x3[sk]; ok {for tmpk, tmpv := range sv {if _, ok := x3[sk][tmpk]; ok {x3[sk][tmpk] += tmpv} else {x3[sk][tmpk] = tmpv}}} else {x3[sk] = sv ## 注意这里，map的赋值是个引用行为！}}}fmt.Printf("foo=%v\n", foo)fmt.Printf("x3=%v\n", x3)}上述代码的目的是对一个3层map根据第2层key做merge（本例中是值累加），最终结果存入x3。比如，foo的一级key是"x1"和"x2"，其对应的value都是个两级map结构，我们要对1级key的value这个两级map根据其key做merge，具体在上述代码中，一级key对应的value分别是map[s1:map[s2:0.1]]和map[s1:map[s2:0.5]]，由于它们有公共的key "s1"，所以需要merge s1的value，而由于s1 value也是个map（分别是map[s2:0.1]和map[s2:0.5]）且它们仍有公共key "s2"，所以需要对两个s2的value做累加。总之，我们预期的结果是x3 = map[s1:map[s2:0.6]]，同时不改变原来的那个3层map的值。下面是上述代码的执行结果：foo=map[x1:map[s1:map[s2:0.6]] x2:map[s1:map[s2:0.5]]]x3=map[s1:map[s2:0.6]]可以看到，x3确实得到了预期的结果，但是，foo的值却被修改了（注意foo["x1"]["s1"]["s2"]的值由原来的0.1变成了0.6），如果应用程序后面要用到foo，那这个坑肯定是踩定了。bug是哪里引入的呢？请看代码中x3[sk] = sv那句（第29行），由于前面提到的map的reference特性，s3[sk]的值与sv指向的是同一个hashmap，而代码在第30-34行对x3[sk]的值做进一步merge时，修改了这个hashmap！这会导致foo["x1"]["s1"]["s2"]的值也被修改（因为它们共用底层存储区）。所以，这种情况下，我们必须手动对目的map做“深拷贝”，避免源map也被修改，下面是bug fix后的代码。// bug fix version: mapref.gopackage mainimport ("fmt")func main() {foo := make(map[string]map[string]map[string]float32)foo_s12 := map[string]float32{"s2": 0.1}foo_s1 := map[string]map[string]float32{"s1": foo_s12}foo["x1"] = foo_s1foo_s22 := map[string]float32{"s2": 0.5}foo_s2 := map[string]map[string]float32{"s1": foo_s22}foo["x2"] = foo_s2x3 := make(map[string]map[string]float32)for _, v := range foo {for sk, sv := range v {if _, ok := x3[sk]; ok {for tmpk, tmpv := range sv {if _, ok := x3[sk][tmpk]; ok {x3[sk][tmpk] += tmpv} else {x3[sk][tmpk] = tmpv}}} else {// handel "deep copy" manually if necessarytmp := make(map[string]float32)for k, v := range sv {tmp[k] = v}x3[sk] = tmp}}}fmt.Printf("foo=%v\n", foo)fmt.Printf("x3=%v\n", x3)}执行结果符合预期：foo=map[x1:map[s1:map[s2:0.1]] x2:map[s1:map[s2:0.5]]]x3=map[s1:map[s2:0.6]]以上就是本文要说明的问题及避免写出相关bug代码的方法。其实Go map的这个行为与Python中的dict行为非常类似，引入bug的原因都是由于它们的赋值行为是by reference的。关于Python的类似问题，之前的一篇笔记有过说明，感兴趣的话可以去查看。【参考资料】1. Go source code – src/pkg/runtime/hashmap.c2. The Go Programming Language Specification – Map types3. Effective Go – Maps4. Go Slices: usage and internals5. Go maps in action6. Go FAQ – Why are maps, slices, and channels references while arrays are values?

============================ EOF ========================

人生谁无少年时，甜苦酸辛各自知。

相关文章：

你感兴趣的文章：

标签云：