关联博客:kubernetes/k8s CSI分析-容器存储接口分析kubernetes/k8s CRI分析-容器运行时接口分析
概述
kubernetes的设计初衷是支持可插拔架构,从而利于扩展kubernetes的功能。在此架构思想下,kubernetes提供了3个特定功能的接口,分别是容器网络接口CNI、容器运行时接口CRI和容器存储接口CSI。kubernetes通过调用这几个接口,来完成相应的功能。
下面我们来对容器运行时接口CNI来做一下介绍与分析。
CNI是什么
CNI,全称是 Container Network Interface,即容器网络接口。
CNI是K8s 中标准的调用网络实现的接口。Kubelet 通过这个标准的接口来调用不同的网络插件以实现不同的网络配置方式。
CNI网络插件是一个可执行文件,是遵守容器网络接口(CNI)规范的网络插件。常见的 CNI网络插件包括 Calico、flannel、Terway、Weave Net等。
当kubelet选择使用CNI类型的网络插件时(通过kubelet启动参数指定),kubelet在创建pod、删除pod的时候,会调用CNI网络插件来做pod的构建网络和销毁网络等操作。
kubelet的网络插件
kubelet的网络插件有以下3种类型:
(1)CNI;
(2)kubenet;
(3)Noop,代表不配置网络插件。
这里主要对kubelet中CNI相关的源码进行分析。
CNI架构
kubelet创建/删除pod时,会调用CRI,然后CRI会调用CNI来进行pod网络的构建/删除。
kubelet构建pod网络的大致过程
(1)kubelet先通过CRI创建pause容器(pod sandbox),生成network namespace;
(2)kubelet根据启动参数配置调用具体的网络插件如CNI网络插件;
(3)网络插件给pause容器(pod sandbox)配置网络;
(4)pod 中其他的容器都与pause容器(pod sandbox)共享网络。
kubelet中cni相关的源码分析
kubelet的cni源码分析包括如下几部分:
(1)cni相关启动参数分析;
(2)关键struct/interface分析;
(3)cni初始化分析;
(4)cni构建pod网络分析;
(5)cni销毁pod网络分析。
基于tag v1.17.4
https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4
1.kubelet组件cni相关启动参数分析
kubelet组件cni相关启动参数相关代码如下:
// pkg/kubelet/config/flags.gofunc (s *ContainerRuntimeOptions) AddFlags(fs *pflag.FlagSet) {...// Network plugin settings for Docker.fs.StringVar(&s.NetworkPluginName, "network-plugin", s.NetworkPluginName, fmt.Sprintf("<Warning: Alpha feature> The name of the network plugin to be invoked for various events in kubelet/pod lifecycle. %s", dockerOnlyWarning))fs.StringVar(&s.CNIConfDir, "cni-conf-dir", s.CNIConfDir, fmt.Sprintf("<Warning: Alpha feature> The full path of the directory in which to search for CNI config files. %s", dockerOnlyWarning))fs.StringVar(&s.CNIBinDir, "cni-bin-dir", s.CNIBinDir, fmt.Sprintf("<Warning: Alpha feature> A comma-separated list of full paths of directories in which to search for CNI plugin binaries. %s", dockerOnlyWarning))fs.StringVar(&s.CNICacheDir, "cni-cache-dir", s.CNICacheDir, fmt.Sprintf("<Warning: Alpha feature> The full path of the directory in which CNI should store cache files. %s", dockerOnlyWarning))fs.Int32Var(&s.NetworkPluginMTU, "network-plugin-mtu", s.NetworkPluginMTU, fmt.Sprintf("<Warning: Alpha feature> The MTU to be passed to the network plugin, to override the default. Set to 0 to use the default 1460 MTU. %s", dockerOnlyWarning))...}
cni相关启动参数的默认值在
NewContainerRuntimeOptions
函数中设置。
// cmd/kubelet/app/options/container_runtime.go// NewContainerRuntimeOptions will create a new ContainerRuntimeOptions with// default values.func NewContainerRuntimeOptions() *config.ContainerRuntimeOptions {dockerEndpoint := ""if runtime.GOOS != "windows" {dockerEndpoint = "unix:///var/run/docker.sock"}return &config.ContainerRuntimeOptions{ContainerRuntime: kubetypes.DockerContainerRuntime,RedirectContainerStreaming: false,DockerEndpoint: dockerEndpoint,DockershimRootDirectory: "/var/lib/dockershim",PodSandboxImage: defaultPodSandboxImage,ImagePullProgressDeadline: metav1.Duration{Duration: 1 * time.Minute},ExperimentalDockershim: false,//Alpha featureCNIBinDir: "/opt/cni/bin",CNIConfDir: "/etc/cni/net.d",CNICacheDir: "/var/lib/cni/cache",}}
下面来简单分析几个比较重要的cni相关启动参数:
(1)
--network-plugin
:指定要使用的网络插件类型,可选值
cni
、
kubenet
、
""
,默认为空串,代表Noop,即不配置网络插件(不构建pod网络)。此处配置值为
cni
时,即指定kubelet使用的网络插件类型为
cni
。
(2)
--cni-conf-dir
:CNI 配置文件所在路径。默认值:
/etc/cni/net.d
。
(3)
--cni-bin-dir
:CNI 插件的可执行文件所在路径,kubelet 将在此路径中查找 CNI 插件的可执行文件来执行pod的网络操作。默认值:
/opt/cni/bin
。
2.关键struct/interface分析
interface NetworkPlugin
先来看下关键的interface:
NetworkPlugin
。
NetworkPlugin interface声明了kubelet网络插件的一些操作方法,不同类型的网络插件只需要实现这些方法即可,其中最关键的就是
SetUpPod
与
TearDownPod
方法,作用分别是构建pod网络与销毁pod网络,
cniNetworkPlugin
实现了该interface。
// pkg/kubelet/dockershim/network/plugins.go// NetworkPlugin is an interface to network plugins for the kubelettype NetworkPlugin interface {// Init initializes the plugin. This will be called exactly once// before any other methods are called.Init(host Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) error// Called on various events like:// NET_PLUGIN_EVENT_POD_CIDR_CHANGEEvent(name string, details map[string]interface{})// Name returns the plugin\'s name. This will be used when searching// for a plugin by name, e.g.Name() string// Returns a set of NET_PLUGIN_CAPABILITY_*Capabilities() utilsets.Int// SetUpPod is the method called after the infra container of// the pod has been created but before the other containers of the// pod are launched.SetUpPod(namespace string, name string, podSandboxID kubecontainer.ContainerID, annotations, options map[string]string) error// TearDownPod is the method called before a pod\'s infra container will be deletedTearDownPod(namespace string, name string, podSandboxID kubecontainer.ContainerID) error// GetPodNetworkStatus is the method called to obtain the ipv4 or ipv6 addresses of the containerGetPodNetworkStatus(namespace string, name string, podSandboxID kubecontainer.ContainerID) (*PodNetworkStatus, error)// Status returns error if the network plugin is in error stateStatus() error}
struct cniNetworkPlugin
cniNetworkPlugin struct实现了
NetworkPlugin interface
,实现了
SetUpPod
与
TearDownPod
等方法。
// pkg/kubelet/dockershim/network/cni/cni.gotype cniNetworkPlugin struct {network.NoopNetworkPluginloNetwork *cniNetworksync.RWMutexdefaultNetwork *cniNetworkhost network.Hostexecer utilexec.InterfacensenterPath stringconfDir stringbinDirs []stringcacheDir stringpodCidr string}
struct PluginManager
struct PluginManager中的
plugin
属性是
interface NetworkPlugin
类型,可以传入具体的网络插件实现,如
cniNetworkPlugin struct
。
// pkg/kubelet/dockershim/network/plugins.go// The PluginManager wraps a kubelet network plugin and provides synchronization// for a given pod\'s network operations. Each pod\'s setup/teardown/status operations// are synchronized against each other, but network operations of other pods can// proceed in parallel.type PluginManager struct {// Network plugin being wrappedplugin NetworkPlugin// Pod list and lockpodsLock sync.Mutexpods map[string]*podLock}
struct dockerService
struct dockerService其实在CRI分析的博文部分有做过详细分析,可以去回顾一下,下面再简单做一下介绍。
struct dockerService实现了CRI shim服务端的容器运行时接口以及容器镜像接口,所以其代表了dockershim(kubelet内置的CRI shim)的服务端。
struct dockerService中的
network
属性是
struct PluginManager
类型,在该结构体初始化时会将具体的网络插件结构体如
struct cniNetworkPlugin
存储进该属性。
创建pod、删除pod时会根据
dockerService
结构体的
network
属性里面存储的具体的网络插件结构体,去调用某个具体网络插件(如
cniNetworkPlugin
)的
SetUpPod
、
TearDownPod
方法来构建pod的网络、销毁pod的网络。
// pkg/kubelet/dockershim/docker_service.gotype dockerService struct {client libdocker.Interfaceos kubecontainer.OSInterfacepodSandboxImage stringstreamingRuntime *streamingRuntimestreamingServer streaming.Servernetwork *network.PluginManager// Map of podSandboxID :: network-is-readynetworkReady map[string]boolnetworkReadyLock sync.MutexcontainerManager cm.ContainerManager// cgroup driver used by Docker runtime.cgroupDriver stringcheckpointManager checkpointmanager.CheckpointManager// caches the version of the runtime.// To be compatible with multiple docker versions, we need to perform// version checking for some operations. Use this cache to avoid querying// the docker daemon every time we need to do such checks.versionCache *cache.ObjectCache// startLocalStreamingServer indicates whether dockershim should start a// streaming server on localhost.startLocalStreamingServer bool// containerCleanupInfos maps container IDs to the `containerCleanupInfo` structs// needed to clean up after containers have been removed.// (see `applyPlatformSpecificDockerConfig` and `performPlatformSpecificContainerCleanup`// methods for more info).containerCleanupInfos map[string]*containerCleanupInfo}
3.cni初始化分析
Kubelet 启动过程中针对网络主要做以下步骤,分别是探针获取当前环境的网络插件以及初始化网络插件(只有当容器运行时选择为内置dockershim时,才会做CNI的初始化操作,将CNI初始化完成后交给dockershim使用)。
cni初始化的调用链:
main (cmd/kubelet/kubelet.go)-> NewKubeletCommand (cmd/kubelet/app/server.go)-> Run (cmd/kubelet/app/server.go)
-> run (cmd/kubelet/app/server.go)
-> RunKubelet (cmd/kubelet/app/server.go)-> CreateAndInitKubelet(cmd/kubelet/app/server.go)-> kubelet.NewMainKubelet(pkg/kubelet/kubelet.go)-> cni.ProbeNetworkPlugins & network.InitNetworkPlugin(pkg/kubelet/network/plugins.go)
调用链很长,这里直接进入关键的函数
NewMainKubelet
进行分析。
NewMainKubelet
NewMainKubelet函数中主要看到
dockershim.NewDockerService
调用。
// pkg/kubelet/kubelet.go// NewMainKubelet instantiates a new Kubelet object along with all the required internal modules.// No initialization of Kubelet and its modules should happen here.func NewMainKubelet(kubeCfg *kubeletconfiginternal.KubeletConfiguration,...) {...switch containerRuntime {case kubetypes.DockerContainerRuntime:// Create and start the CRI shim running as a grpc server.streamingConfig := getStreamingConfig(kubeCfg, kubeDeps, crOptions)ds, err := dockershim.NewDockerService(kubeDeps.DockerClientConfig, crOptions.PodSandboxImage, streamingConfig,&pluginSettings, runtimeCgroups, kubeCfg.CgroupDriver, crOptions.DockershimRootDirectory, !crOptions.RedirectContainerStreaming)...}
这里对变量
containerRuntime
值等于
docker
时做分析,即kubelet启动参数
--container-runtime
值为
docker
,这时kubelet会使用内置的
CRI shim
即
dockershim
作为容器运行时,初始化并启动
dockershim
。
其中,调用
dockershim.NewDockerService
的作用是:新建并初始化
dockershim
服务端,包括初始化docker client、初始化cni网络配置等操作。
而其中CNI部分的主要逻辑为:
(1)调用
cni.ProbeNetworkPlugins
:根据kubelet启动参数cni相关配置,获取cni配置文件、cni网络插件可执行文件等信息,根据这些cni的相关信息来初始化
cniNetworkPlugin
结构体并返回;
(2)调用
network.InitNetworkPlugin
:根据networkPluginName的值(对应kubelet启动参数
--network-plugin
),选择相应的网络插件,调用其
Init()
方法,做网络插件的初始化操作(初始化操作主要是起了一个goroutine,定时探测cni的配置文件以及可执行文件,让其可以热更新);
(3)将上面步骤中获取到的
cniNetworkPlugin
结构体,赋值给
dockerService struct
的
network
属性,待后续创建pod、删除pod时可以调用
cniNetworkPlugin
的
SetUpPod
、
TearDownPod
方法来构建pod的网络、销毁pod的网络。
kubelet对CNI的实现的主要代码:
pkg/kubelet/network/cni/cni.go-SetUpPod/TearDownPod
(构建Pod网络和销毁Pod网络)
其中函数入参
pluginSettings *NetworkPluginSettings
的参数值,其实是从kubelet启动参数配置而来,kubelet cni相关启动参数在前面已经做了分析了,忘记的可以回头看一下。
// pkg/kubelet/dockershim/docker_service.go// NewDockerService creates a new `DockerService` struct.// NOTE: Anything passed to DockerService should be eventually handled in another way when we switch to running the shim as a different process.func NewDockerService(config *ClientConfig, podSandboxImage string, streamingConfig *streaming.Config, pluginSettings *NetworkPluginSettings,cgroupsName string, kubeCgroupDriver string, dockershimRootDir string, startLocalStreamingServer bool, noJsonLogPath string) (DockerService, error) {...ds := &dockerService{client: c,os: kubecontainer.RealOS{},podSandboxImage: podSandboxImage,streamingRuntime: &streamingRuntime{client: client,execHandler: &NativeExecHandler{},},containerManager: cm.NewContainerManager(cgroupsName, client),checkpointManager: checkpointManager,startLocalStreamingServer: startLocalStreamingServer,networkReady: make(map[string]bool),containerCleanupInfos: make(map[string]*containerCleanupInfo),noJsonLogPath: noJsonLogPath,}...// dockershim currently only supports CNI plugins.pluginSettings.PluginBinDirs = cni.SplitDirs(pluginSettings.PluginBinDirString)// (1)根据kubelet启动参数cni相关配置,获取cni配置文件、cni网络插件可执行文件等信息,根据这些cni的相关信息来初始化```cniNetworkPlugin```结构体并返回cniPlugins := cni.ProbeNetworkPlugins(pluginSettings.PluginConfDir, pluginSettings.PluginCacheDir, pluginSettings.PluginBinDirs)cniPlugins = append(cniPlugins, kubenet.NewPlugin(pluginSettings.PluginBinDirs, pluginSettings.PluginCacheDir))netHost := &dockerNetworkHost{&namespaceGetter{ds},&portMappingGetter{ds},}// (2)根据networkPluginName的值(对应kubelet启动参数```--network-plugin```),选择相应的网络插件,调用其```Init()```方法,做网络插件的初始化操作(初始化操作主要是起了一个goroutine,定时探测cni的配置文件以及可执行文件,让其可以热更新)plug, err := network.InitNetworkPlugin(cniPlugins, pluginSettings.PluginName, netHost, pluginSettings.HairpinMode, pluginSettings.NonMasqueradeCIDR, pluginSettings.MTU)if err != nil {return nil, fmt.Errorf("didn\'t find compatible CNI plugin with given settings %+v: %v", pluginSettings, err)}// (3)将上面步骤中获取到的```cniNetworkPlugin```结构体,赋值给```dockerService struct```的```network```属性,待后续创建pod、删除pod时可以调用```cniNetworkPlugin```的```SetUpPod```、```TearDownPod```方法来构建pod的网络、销毁pod的网络。ds.network = network.NewPluginManager(plug)klog.Infof("Docker cri networking managed by %v", plug.Name())...}
先来看下
pluginSettings
长什么样,其实是
struct NetworkPluginSettings
,包含了网络插件名称、网络插件可执行文件所在目录、网络插件配置文件所在目录等属性,代码如下:
// pkg/kubelet/dockershim/docker_service.gotype NetworkPluginSettings struct {// HairpinMode is best described by comments surrounding the kubelet argHairpinMode kubeletconfig.HairpinMode// NonMasqueradeCIDR is the range of ips which should *not* be included// in any MASQUERADE rules applied by the pluginNonMasqueradeCIDR string// PluginName is the name of the plugin, runtime shim probes forPluginName string// PluginBinDirString is a list of directiores delimited by commas, in// which the binaries for the plugin with PluginName may be found.PluginBinDirString string// PluginBinDirs is an array of directories in which the binaries for// the plugin with PluginName may be found. The admin is responsible for// provisioning these binaries before-hand.PluginBinDirs []string// PluginConfDir is the directory in which the admin places a CNI conf.// Depending on the plugin, this may be an optional field, eg: kubenet// generates its own plugin conf.PluginConfDir string// PluginCacheDir is the directory in which CNI should store cache files.PluginCacheDir string// MTU is the desired MTU for network devices created by the plugin.MTU int}
3.1 cni.ProbeNetworkPlugins
cni.ProbeNetworkPlugins中主要作用为:根据kubelet启动参数cni相关配置,获取cni配置文件、cni网络插件可执行文件等信息,根据这些cni的相关信息来初始化
cniNetworkPlugin
结构体并返回。
其中看到
plugin.syncNetworkConfig()
调用,主要作用是给
cniNetworkPlugin
结构体的
defaultNetwork
属性赋值。
// pkg/kubelet/dockershim/network/cni/cni.go// ProbeNetworkPlugins : get the network plugin based on cni conf file and bin filefunc ProbeNetworkPlugins(confDir, cacheDir string, binDirs []string) []network.NetworkPlugin {old := binDirsbinDirs = make([]string, 0, len(binDirs))for _, dir := range old {if dir != "" {binDirs = append(binDirs, dir)}}plugin := &cniNetworkPlugin{defaultNetwork: nil,loNetwork: getLoNetwork(binDirs),execer: utilexec.New(),confDir: confDir,binDirs: binDirs,cacheDir: cacheDir,}// sync NetworkConfig in best effort during probing.plugin.syncNetworkConfig()return []network.NetworkPlugin{plugin}}
plugin.syncNetworkConfig()
主要逻辑:
(1)
getDefaultCNINetwork()
:根据kubelet启动参数配置,去对应的cni conf文件夹下寻找cni配置文件,返回包含cni信息的cniNetwork结构体;
(2)
plugin.setDefaultNetwork()
:根据上一步获取到的cniNetwork结构体,赋值给
cniNetworkPlugin
结构体的
defaultNetwork
属性。
// pkg/kubelet/dockershim/network/cni/cni.gofunc (plugin *cniNetworkPlugin) syncNetworkConfig() {network, err := getDefaultCNINetwork(plugin.confDir, plugin.binDirs)if err != nil {klog.Warningf("Unable to update cni config: %s", err)return}plugin.setDefaultNetwork(network)}
getDefaultCNINetwork()
主要逻辑:
(1)在cni配置文件所在目录下,可以识别3种cni配置文件,分别是
.conf
,
.conflist
,
.json
。
(2)调用
sort.Strings()
将cni配置文件所在目录下的所有cni配置文件按照字典顺序升序排序。
(3)只取第一个读取到的cni配置文件,然后直接return。所以就算在cni配置文件目录下配置了多个cni配置文件,也只会有其中一个最终生效。
(4)调用
cniConfig.ValidateNetworkList()
,校验cni可执行文件目录下是否存在对应的可执行文件。
// pkg/kubelet/dockershim/network/cni/cni.gofunc getDefaultCNINetwork(confDir string, binDirs []string) (*cniNetwork, error) {files, err := libcni.ConfFiles(confDir, []string{".conf", ".conflist", ".json"})switch {case err != nil:return nil, errcase len(files) == 0:return nil, fmt.Errorf("no networks found in %s", confDir)}cniConfig := &libcni.CNIConfig{Path: binDirs}sort.Strings(files)for _, confFile := range files {var confList *libcni.NetworkConfigListif strings.HasSuffix(confFile, ".conflist") {confList, err = libcni.ConfListFromFile(confFile)if err != nil {klog.Warningf("Error loading CNI config list file %s: %v", confFile, err)continue}} else {conf, err := libcni.ConfFromFile(confFile)if err != nil {klog.Warningf("Error loading CNI config file %s: %v", confFile, err)continue}// Ensure the config has a "type" so we know what plugin to run.// Also catches the case where somebody put a conflist into a conf file.if conf.Network.Type == "" {klog.Warningf("Error loading CNI config file %s: no \'type\'; perhaps this is a .conflist?", confFile)continue}confList, err = libcni.ConfListFromConf(conf)if err != nil {klog.Warningf("Error converting CNI config file %s to list: %v", confFile, err)continue}}if len(confList.Plugins) == 0 {klog.Warningf("CNI config list %s has no networks, skipping", string(confList.Bytes[:maxStringLengthInLog(len(confList.Bytes))]))continue}// Before using this CNI config, we have to validate it to make sure that// all plugins of this config exist on diskcaps, err := cniConfig.ValidateNetworkList(context.TODO(), confList)if err != nil {klog.Warningf("Error validating CNI config list %s: %v", string(confList.Bytes[:maxStringLengthInLog(len(confList.Bytes))]), err)continue}klog.V(4).Infof("Using CNI configuration file %s", confFile)return &cniNetwork{name: confList.Name,NetworkConfig: confList,CNIConfig: cniConfig,Capabilities: caps,}, nil}return nil, fmt.Errorf("no valid networks found in %s", confDir)}
plugin.setDefaultNetwork
将上面获取到的
cniNetwork
结构体赋值给
cniNetworkPlugin
结构体的
defaultNetwork
属性。
// pkg/kubelet/dockershim/network/cni/cni.gofunc (plugin *cniNetworkPlugin) setDefaultNetwork(n *cniNetwork) {plugin.Lock()defer plugin.Unlock()plugin.defaultNetwork = n}
3.2 network.InitNetworkPlugin
network.InitNetworkPlugin()主要作用:根据networkPluginName的值(对应kubelet启动参数
--network-plugin
),选择相应的网络插件,调用其
Init()
方法,做网络插件的初始化操作。
// pkg/kubelet/dockershim/network/plugins.go// InitNetworkPlugin inits the plugin that matches networkPluginName. Plugins must have unique names.func InitNetworkPlugin(plugins []NetworkPlugin, networkPluginName string, host Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) (NetworkPlugin, error) {if networkPluginName == "" {// default to the no_op pluginplug := &NoopNetworkPlugin{}plug.Sysctl = utilsysctl.New()if err := plug.Init(host, hairpinMode, nonMasqueradeCIDR, mtu); err != nil {return nil, err}return plug, nil}pluginMap := map[string]NetworkPlugin{}allErrs := []error{}for _, plugin := range plugins {name := plugin.Name()if errs := validation.IsQualifiedName(name); len(errs) != 0 {allErrs = append(allErrs, fmt.Errorf("network plugin has invalid name: %q: %s", name, strings.Join(errs, ";")))continue}if _, found := pluginMap[name]; found {allErrs = append(allErrs, fmt.Errorf("network plugin %q was registered more than once", name))continue}pluginMap[name] = plugin}chosenPlugin := pluginMap[networkPluginName]if chosenPlugin != nil {err := chosenPlugin.Init(host, hairpinMode, nonMasqueradeCIDR, mtu)if err != nil {allErrs = append(allErrs, fmt.Errorf("network plugin %q failed init: %v", networkPluginName, err))} else {klog.V(1).Infof("Loaded network plugin %q", networkPluginName)}} else {allErrs = append(allErrs, fmt.Errorf("network plugin %q not found", networkPluginName))}return chosenPlugin, utilerrors.NewAggregate(allErrs)}
chosenPlugin.Init()
当kubelet启动参数
--network-plugin
的值配置为
cni
时,会调用到
cniNetworkPlugin
的
Init()
方法,代码如下。
启动一个goroutine,每隔5秒,调用一次
plugin.syncNetworkConfig
。再来回忆一下
plugin.syncNetworkConfig()
的作用:根据kubelet启动参数配置,去对应的cni conf文件夹下寻找cni配置文件,返回包含cni信息的cniNetwork结构体,赋值给
cniNetworkPlugin
结构体的
defaultNetwork
属性,从而达到cni conf以及bin更新后,kubelet也能感知并更新
cniNetworkPlugin
结构体的效果。
此处也可以看出该goroutine存在的意义,让cni的配置文件以及可执行文件等可以热更新,而无需重启kubelet。
// pkg/kubelet/dockershim/network/cni/cni.gofunc (plugin *cniNetworkPlugin) Init(host network.Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) error {err := plugin.platformInit()if err != nil {return err}plugin.host = hostplugin.syncNetworkConfig()// start a goroutine to sync network config from confDir periodically to detect network config updates in every 5 secondsgo wait.Forever(plugin.syncNetworkConfig, defaultSyncConfigPeriod)return nil}
plugin.platformInit()只是检查了下是否有
nsenter
,没有做其他操作。
// pkg/kubelet/dockershim/network/cni/cni_others.gofunc (plugin *cniNetworkPlugin) platformInit() error {var err errorplugin.nsenterPath, err = plugin.execer.LookPath("nsenter")if err != nil {return err}return nil}
4.CNI构建pod网络分析
kubelet创建pod时,通过CRI创建并启动pod sandbox,然后CRI会调用CNI网络插件构建pod网络。
kubelet中CNI构建pod网络的方法是:
pkg/kubelet/network/cni/cni.go-SetUpPod
。
其中
SetUpPod
方法的调用链如下(只列出了关键部分):
main (cmd/kubelet/kubelet.go)…-> klet.syncPod(pkg/kubelet/kubelet.go)-> kl.containerRuntime.SyncPod(pkg/kubelet/kubelet.go)-> m.createPodSandbox(pkg/kubelet/kuberuntime/kuberuntime_manager.go)-> m.runtimeService.RunPodSandbox (pkg/kubelet/kuberuntime/kuberuntime_sandbox.go)-> ds.network.SetUpPod(pkg/kubelet/dockershim/docker_sandbox.go)-> pm.plugin.SetUpPod(pkg/kubelet/dockershim/network/plugins.go)-> SetUpPod(pkg/kubelet/dockershim/network/cni/cni.go)
下面的代码只是列出来看一下关键方法
cniNetworkPlugin.SetUpPod()
的调用链,不做具体分析。
// pkg/kubelet/kuberuntime/kuberuntime_manager.gofunc (m *kubeGenericRuntimeManager) SyncPod(pod *v1.Pod, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, backOff *flowcontrol.Backoff) (result kubecontainer.PodSyncResult) {...podSandboxID, msg, err = m.createPodSandbox(pod, podContainerChanges.Attempt)...}
// pkg/kubelet/kuberuntime/kuberuntime_sandbox.go// createPodSandbox creates a pod sandbox and returns (podSandBoxID, message, error).func (m *kubeGenericRuntimeManager) createPodSandbox(pod *v1.Pod, attempt uint32) (string, string, error) {...podSandBoxID, err := m.runtimeService.RunPodSandbox(podSandboxConfig, runtimeHandler)...}
在
RunPodSandbox
方法中可以看到,是先创建pod sandbox,然后启动pod sandbox,然后才是给该pod sandbox构建网络。
// pkg/kubelet/dockershim/docker_sandbox.gofunc (ds *dockerService) RunPodSandbox(ctx context.Context, r *runtimeapi.RunPodSandboxRequest) (*runtimeapi.RunPodSandboxResponse, error) {...createResp, err := ds.client.CreateContainer(*createConfig)...err = ds.client.StartContainer(createResp.ID)...err = ds.network.SetUpPod(config.GetMetadata().Namespace, config.GetMetadata().Name, cID, config.Annotations, networkOptions)...}
在
PluginManager.SetUpPod
方法中可以看到,调用了
pm.plugin.SetUpPod
,前面介绍cni初始化的时候讲过相关赋值初始化操作,这里会调用到
cniNetworkPlugin
的
SetUpPod
方法。
// pkg/kubelet/dockershim/network/plugins.gofunc (pm *PluginManager) SetUpPod(podNamespace, podName string, id kubecontainer.ContainerID, annotations, options map[string]string) error {defer recordOperation("set_up_pod", time.Now())fullPodName := kubecontainer.BuildPodFullName(podName, podNamespace)pm.podLock(fullPodName).Lock()defer pm.podUnlock(fullPodName)klog.V(3).Infof("Calling network plugin %s to set up pod %q", pm.plugin.Name(), fullPodName)if err := pm.plugin.SetUpPod(podNamespace, podName, id, annotations, options); err != nil {return fmt.Errorf("networkPlugin %s failed to set up pod %q network: %v", pm.plugin.Name(), fullPodName, err)}return nil}
cniNetworkPlugin.SetUpPod
cniNetworkPlugin.SetUpPod方法作用cni网络插件构建pod网络的调用入口。其主要逻辑为:
(1)调用
plugin.checkInitialized()
:检查网络插件是否已经初始化完成;
(2)调用
plugin.host.GetNetNS()
:获取容器网络命名空间路径,格式
/proc/${容器PID}/ns/net
;
(3)调用
context.WithTimeout()
:设置调用cni网络插件的超时时间;
(3)调用
plugin.addToNetwork()
:如果是linux环境,则调用cni网络插件,给pod构建回环网络;
(4)调用
plugin.addToNetwork()
:调用cni网络插件,给pod构建默认网络。
// pkg/kubelet/dockershim/network/cni/cni.gofunc (plugin *cniNetworkPlugin) SetUpPod(namespace string, name string, id kubecontainer.ContainerID, annotations, options map[string]string) error {if err := plugin.checkInitialized(); err != nil {return err}netnsPath, err := plugin.host.GetNetNS(id.ID)if err != nil {return fmt.Errorf("CNI failed to retrieve network namespace path: %v", err)}// Todo get the timeout from parent ctxcniTimeoutCtx, cancelFunc := context.WithTimeout(context.Background(), network.CNITimeoutSec*time.Second)defer cancelFunc()// Windows doesn\'t have loNetwork. It comes only with Linuxif plugin.loNetwork != nil {if _, err = plugin.addToNetwork(cniTimeoutCtx, plugin.loNetwork, name, namespace, id, netnsPath, annotations, options); err != nil {return err}}_, err = plugin.addToNetwork(cniTimeoutCtx, plugin.getDefaultNetwork(), name, namespace, id, netnsPath, annotations, options)return err}
plugin.addToNetwork
plugin.addToNetwork方法的作用就是调用cni网络插件,给pod构建指定类型的网络,其主要逻辑为:
(1)调用plugin.buildCNIRuntimeConf():构建调用cni网络插件的配置;
(2)调用cniNet.AddNetworkList():调用cni网络插件,进行网络构建。
// pkg/kubelet/dockershim/network/cni/cni.gofunc (plugin *cniNetworkPlugin) addToNetwork(ctx context.Context, network *cniNetwork, podName string, podNamespace string, podSandboxID kubecontainer.ContainerID, podNetnsPath string, annotations, options map[string]string) (cnitypes.Result, error) {rt, err := plugin.buildCNIRuntimeConf(podName, podNamespace, podSandboxID, podNetnsPath, annotations, options)if err != nil {klog.Errorf("Error adding network when building cni runtime conf: %v", err)return nil, err}pdesc := podDesc(podNamespace, podName, podSandboxID)netConf, cniNet := network.NetworkConfig, network.CNIConfigklog.V(4).Infof("Adding %s to network %s/%s netns %q", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, podNetnsPath)res, err := cniNet.AddNetworkList(ctx, netConf, rt)if err != nil {klog.Errorf("Error adding %s to network %s/%s: %v", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, err)return nil, err}klog.V(4).Infof("Added %s to network %s: %v", pdesc, netConf.Name, res)return res, nil}
cniNet.AddNetworkList
AddNetworkList方法中主要是调用了addNetwork方法,所以来看下addNetwork方法的逻辑:
(1)调用
c.exec.FindInPath()
:拼接出cni网络插件可执行文件的绝对路径;
(2)调用
buildOneConfig()
:构建配置;
(3)调用
c.args()
:构建调用cni网络插件的参数;(4)调用
invoke.ExecPluginWithResult()
:调用cni网络插件进行pod网络的构建操作。
// vendor/github.com/containernetworking/cni/libcni/api.gofunc (c *CNIConfig) AddNetworkList(ctx context.Context, list *NetworkConfigList, rt *RuntimeConf) (types.Result, error) {var err errorvar result types.Resultfor _, net := range list.Plugins {result, err = c.addNetwork(ctx, list.Name, list.CNIVersion, net, result, rt)if err != nil {return nil, err}}if err = setCachedResult(result, list.Name, rt); err != nil {return nil, fmt.Errorf("failed to set network %q cached result: %v", list.Name, err)}return result, nil}func (c *CNIConfig) addNetwork(ctx context.Context, name, cniVersion string, net *NetworkConfig, prevResult types.Result, rt *RuntimeConf) (types.Result, error) {c.ensureExec()pluginPath, err := c.exec.FindInPath(net.Network.Type, c.Path)if err != nil {return nil, err}newConf, err := buildOneConfig(name, cniVersion, net, prevResult, rt)if err != nil {return nil, err}return invoke.ExecPluginWithResult(ctx, pluginPath, newConf.Bytes, c.args("ADD", rt), c.exec)}
c.args
c.args方法作用是构建调用cni网络插件可执行文件时的参数。
从代码中可以看出,参数有
Command
(命令,
Add
代表构建网络,
Del
代表销毁网络)、
ContainerID
(容器ID)、
NetNS
(容器网络命名空间路径)、
IfName
(Interface Name即网络接口名称)、
PluginArgs
(其他参数如pod名称、pod命名空间等)等。
// vendor/github.com/containernetworking/cni/libcni/api.gofunc (c *CNIConfig) args(action string, rt *RuntimeConf) *invoke.Args {return &invoke.Args{Command: action,ContainerID: rt.ContainerID,NetNS: rt.NetNS,PluginArgs: rt.Args,IfName: rt.IfName,Path: strings.Join(c.Path, string(os.PathListSeparator)),}}
invoke.ExecPluginWithResult
invoke.ExecPluginWithResult主要是将调用参数变成env,然后调用cni网络插件可执行文件,并获取返回结果。
func ExecPluginWithResult(ctx context.Context, pluginPath string, netconf []byte, args CNIArgs, exec Exec) (types.Result, error) {if exec == nil {exec = defaultExec}stdoutBytes, err := exec.ExecPlugin(ctx, pluginPath, netconf, args.AsEnv())if err != nil {return nil, err}// Plugin must return result in same version as specified in netconfversionDecoder := &version.ConfigDecoder{}confVersion, err := versionDecoder.Decode(netconf)if err != nil {return nil, err}return version.NewResult(confVersion, stdoutBytes)}
5.CNI销毁pod网络分析
kubelet删除pod时,CRI会调用CNI网络插件销毁pod网络。
kubelet中CNI销毁pod网络的方法是:
pkg/kubelet/network/cni/cni.go-TearDownPod
。
其中
TearDownPod
方法的调用链如下(只列出了关键部分):
main (cmd/kubelet/kubelet.go)…-> m.runtimeService.StopPodSandbox (pkg/kubelet/kuberuntime/kuberuntime_sandbox.go)-> ds.network.TearDownPod(pkg/kubelet/dockershim/docker_sandbox.go)-> pm.plugin.TearDownPod(pkg/kubelet/dockershim/network/plugins.go)-> TearDownPod(pkg/kubelet/dockershim/network/cni/cni.go)
下面的代码只是列出来看一下关键方法
cniNetworkPlugin.TearDownPod()
的调用链,不做具体分析。
在
StopPodSandbox
方法中可以看到,会先销毁pod网络,然后停止pod sandbox的运行,但是这两个操作中的任何一个发生错误,kubelet都会继续进行重试,直到成功为止,所以对这两个操作成功的顺序并没有严格的要求(删除pod sandbox的操作由kubelet gc去完成)。
// pkg/kubelet/dockershim/docker_sandbox.gofunc (ds *dockerService) StopPodSandbox(ctx context.Context, r *runtimeapi.StopPodSandboxRequest) (*runtimeapi.StopPodSandboxResponse, error) {...// WARNING: The following operations made the following assumption:// 1. kubelet will retry on any error returned by StopPodSandbox.// 2. tearing down network and stopping sandbox container can succeed in any sequence.// This depends on the implementation detail of network plugin and proper error handling.// For kubenet, if tearing down network failed and sandbox container is stopped, kubelet// will retry. On retry, kubenet will not be able to retrieve network namespace of the sandbox// since it is stopped. With empty network namespcae, CNI bridge plugin will conduct best// effort clean up and will not return error.errList := []error{}ready, ok := ds.getNetworkReady(podSandboxID)if !hostNetwork && (ready || !ok) {// Only tear down the pod network if we haven\'t done so alreadycID := kubecontainer.BuildContainerID(runtimeName, podSandboxID)err := ds.network.TearDownPod(namespace, name, cID)if err == nil {ds.setNetworkReady(podSandboxID, false)} else {errList = append(errList, err)}}if err := ds.client.StopContainer(podSandboxID, defaultSandboxGracePeriod); err != nil {// Do not return error if the container does not existif !libdocker.IsContainerNotFoundError(err) {klog.Errorf("Failed to stop sandbox %q: %v", podSandboxID, err)errList = append(errList, err)} else {// remove the checkpoint for any sandbox that is not found in the runtimeds.checkpointManager.RemoveCheckpoint(podSandboxID)}}...}
在
PluginManager.TearDownPod
方法中可以看到,调用了
pm.plugin.TearDownPod
,前面介绍cni初始化的时候讲过相关赋值初始化操作,这里会调用到
cniNetworkPlugin
的
TearDownPod
方法。
// pkg/kubelet/dockershim/network/plugins.gofunc (pm *PluginManager) TearDownPod(podNamespace, podName string, id kubecontainer.ContainerID) error {defer recordOperation("tear_down_pod", time.Now())fullPodName := kubecontainer.BuildPodFullName(podName, podNamespace)pm.podLock(fullPodName).Lock()defer pm.podUnlock(fullPodName)klog.V(3).Infof("Calling network plugin %s to tear down pod %q", pm.plugin.Name(), fullPodName)if err := pm.plugin.TearDownPod(podNamespace, podName, id); err != nil {return fmt.Errorf("networkPlugin %s failed to teardown pod %q network: %v", pm.plugin.Name(), fullPodName, err)}return nil}
cniNetworkPlugin.TearDownPod
cniNetworkPlugin.TearDownPod方法作用cni网络插件销毁pod网络的调用入口。其主要逻辑为:
(1)调用
plugin.checkInitialized()
:检查网络插件是否已经初始化完成;
(2)调用
plugin.host.GetNetNS()
:获取容器网络命名空间路径,格式
/proc/${容器PID}/ns/net
;
(3)调用
context.WithTimeout()
:设置调用cni网络插件的超时时间;
(3)调用
plugin.deleteFromNetwork()
:如果是linux环境,则调用cni网络插件,销毁pod的回环网络;
(4)调用
plugin.deleteFromNetwork()
:调用cni网络插件,销毁pod的默认网络。
// pkg/kubelet/dockershim/network/cni/cni.gofunc (plugin *cniNetworkPlugin) TearDownPod(namespace string, name string, id kubecontainer.ContainerID) error {if err := plugin.checkInitialized(); err != nil {return err}// Lack of namespace should not be fatal on teardownnetnsPath, err := plugin.host.GetNetNS(id.ID)if err != nil {klog.Warningf("CNI failed to retrieve network namespace path: %v", err)}// Todo get the timeout from parent ctxcniTimeoutCtx, cancelFunc := context.WithTimeout(context.Background(), network.CNITimeoutSec*time.Second)defer cancelFunc()// Windows doesn\'t have loNetwork. It comes only with Linuxif plugin.loNetwork != nil {// Loopback network deletion failure should not be fatal on teardownif err := plugin.deleteFromNetwork(cniTimeoutCtx, plugin.loNetwork, name, namespace, id, netnsPath, nil); err != nil {klog.Warningf("CNI failed to delete loopback network: %v", err)}}return plugin.deleteFromNetwork(cniTimeoutCtx, plugin.getDefaultNetwork(), name, namespace, id, netnsPath, nil)}
plugin.deleteFromNetwork
plugin.deleteFromNetwork方法的作用就是调用cni网络插件,销毁pod指定类型的网络,其主要逻辑为:
(1)调用plugin.buildCNIRuntimeConf():构建调用cni网络插件的配置;
(2)调用cniNet.DelNetworkList():调用cni网络插件,进行pod网络销毁。
// pkg/kubelet/dockershim/network/cni/cni.gofunc (plugin *cniNetworkPlugin) deleteFromNetwork(ctx context.Context, network *cniNetwork, podName string, podNamespace string, podSandboxID kubecontainer.ContainerID, podNetnsPath string, annotations map[string]string) error {rt, err := plugin.buildCNIRuntimeConf(podName, podNamespace, podSandboxID, podNetnsPath, annotations, nil)if err != nil {klog.Errorf("Error deleting network when building cni runtime conf: %v", err)return err}pdesc := podDesc(podNamespace, podName, podSandboxID)netConf, cniNet := network.NetworkConfig, network.CNIConfigklog.V(4).Infof("Deleting %s from network %s/%s netns %q", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, podNetnsPath)err = cniNet.DelNetworkList(ctx, netConf, rt)// The pod may not get deleted successfully at the first time.// Ignore "no such file or directory" error in case the network has already been deleted in previous attempts.if err != nil && !strings.Contains(err.Error(), "no such file or directory") {klog.Errorf("Error deleting %s from network %s/%s: %v", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, err)return err}klog.V(4).Infof("Deleted %s from network %s/%s", pdesc, netConf.Plugins[0].Network.Type, netConf.Name)return nil}
cniNet.DelNetworkList
DelNetworkList方法中主要是调用了addNetwork方法,所以来看下addNetwork方法的逻辑:
(1)调用
c.exec.FindInPath()
:拼接出cni网络插件可执行文件的绝对路径;
(2)调用
buildOneConfig()
:构建配置;
(3)调用
c.args()
:构建调用cni网络插件的参数;(4)调用
invoke.ExecPluginWithResult()
:调用cni网络插件进行pod网络的销毁操作。
// vendor/github.com/containernetworking/cni/libcni/api.go// DelNetworkList executes a sequence of plugins with the DEL commandfunc (c *CNIConfig) DelNetworkList(ctx context.Context, list *NetworkConfigList, rt *RuntimeConf) error {var cachedResult types.Result// Cached result on DEL was added in CNI spec version 0.4.0 and higherif gtet, err := version.GreaterThanOrEqualTo(list.CNIVersion, "0.4.0"); err != nil {return err} else if gtet {cachedResult, err = getCachedResult(list.Name, list.CNIVersion, rt)if err != nil {return fmt.Errorf("failed to get network %q cached result: %v", list.Name, err)}}for i := len(list.Plugins) - 1; i >= 0; i-- {net := list.Plugins[i]if err := c.delNetwork(ctx, list.Name, list.CNIVersion, net, cachedResult, rt); err != nil {return err}}_ = delCachedResult(list.Name, rt)return nil}func (c *CNIConfig) delNetwork(ctx context.Context, name, cniVersion string, net *NetworkConfig, prevResult types.Result, rt *RuntimeConf) error {c.ensureExec()pluginPath, err := c.exec.FindInPath(net.Network.Type, c.Path)if err != nil {return err}newConf, err := buildOneConfig(name, cniVersion, net, prevResult, rt)if err != nil {return err}return invoke.ExecPluginWithoutResult(ctx, pluginPath, newConf.Bytes, c.args("DEL", rt), c.exec)}
c.args
c.args方法作用是构建调用cni网络插件可执行文件时的参数。
从代码中可以看出,参数有
Command
(命令,
Add
代表构建网络,
Del
代表销毁网络)、
ContainerID
(容器ID)、
NetNS
(容器网络命名空间路径)、
IfName
(Interface Name即网络接口名称)、
PluginArgs
(其他参数如pod名称、pod命名空间等)等。
// vendor/github.com/containernetworking/cni/libcni/api.gofunc (c *CNIConfig) args(action string, rt *RuntimeConf) *invoke.Args {return &invoke.Args{Command: action,ContainerID: rt.ContainerID,NetNS: rt.NetNS,PluginArgs: rt.Args,IfName: rt.IfName,Path: strings.Join(c.Path, string(os.PathListSeparator)),}}
invoke.ExecPluginWithResult
invoke.ExecPluginWithResult主要是将调用参数变成env,然后调用cni网络插件可执行文件,并获取返回结果。
func ExecPluginWithResult(ctx context.Context, pluginPath string, netconf []byte, args CNIArgs, exec Exec) (types.Result, error) {if exec == nil {exec = defaultExec}stdoutBytes, err := exec.ExecPlugin(ctx, pluginPath, netconf, args.AsEnv())if err != nil {return nil, err}// Plugin must return result in same version as specified in netconfversionDecoder := &version.ConfigDecoder{}confVersion, err := versionDecoder.Decode(netconf)if err != nil {return nil, err}return version.NewResult(confVersion, stdoutBytes)}
总结
CNI
CNI,全称是 Container Network Interface,即容器网络接口。
CNI是K8s 中标准的调用网络实现的接口。Kubelet 通过这个标准的接口来调用不同的网络插件以实现不同的网络配置方式。
CNI网络插件是一个可执行文件,是遵守容器网络接口(CNI)规范的网络插件。常见的 CNI网络插件包括 Calico、flannel、Terway、Weave Net等。
当kubelet选择使用CNI类型的网络插件时(通过kubelet启动参数指定),kubelet在创建pod、删除pod的时候,通过CRI调用CNI网络插件来做pod的构建网络和销毁网络等操作。
kubelet构建pod网络的大致过程
(1)kubelet先通过CRI创建pause容器(pod sandbox),生成network namespace;
(2)kubelet根据启动参数配置调用具体的网络插件如CNI网络插件;
(3)网络插件给pause容器(pod sandbox)配置网络;
(4)pod 中其他的容器都与pause容器(pod sandbox)共享网络。
kubelet组件CNI相关启动参数分析
(1)
--network-plugin
:指定要使用的网络插件类型,可选值
cni
、
kubenet
、
""
,默认为空串,代表Noop,即不配置网络插件(不构建pod网络)。此处配置值为
cni
时,即指定kubelet使用的网络插件类型为
cni
。
(2)
--cni-conf-dir
:CNI 配置文件所在路径。默认值:
/etc/cni/net.d
。
(3)
--cni-bin-dir
:CNI 插件的可执行文件所在路径,kubelet 将在此路径中查找 CNI 插件的可执行文件来执行pod的网络操作。默认值:
/opt/cni/bin
。
kubelet中的CNI初始化
kubelet启动后,会根据启动参数中cni的相关参数,获取cni配置文件并初始化cni网络插件,待后续创建pod、删除pod时会调用
SetUpPod
、
TearDownPod
方法来构建pod的网络、销毁pod的网络。同时,初始化时起了一个goroutine,定时探测cni的配置文件以及可执行文件,让其可以热更新。
CNI构建pod网络
kubelet创建pod时,通过CRI创建并启动pod sandbox,然后CRI会调用CNI网络插件构建pod网络。
kubelet中CNI构建pod网络的代码方法是:
pkg/kubelet/network/cni/cni.go-SetUpPod
。
CNI销毁pod网络
kubelet删除pod时,CRI会调用CNI网络插件销毁pod网络。
kubelet中CNI销毁pod网络的方法是:
pkg/kubelet/network/cni/cni.go-TearDownPod
。