作者:張華 發表於:2019-08-14
版權聲明:可以任意轉載,轉載時請務必以超鏈接形式標明文章原始出處和作者信息及本版權聲明 (https://zhhuabj.blog.csdn.net)
問題
出於安全審計需求, 可能需要記錄neutron security group與FWaaS與LBaaS中所有ACCEPT和DROP的連接. 有很多方法, 如sg-logging, ovs IPFIX, ulog等多種方法. 這裏着重介紹ulog方法, 其他兩種方法貼出方法但未調試.
ulog basic
ulog是內核通過netlink直接將日誌發給用戶態的ulogd進程來處理了.
sudo apt install ulogd2 ulogd2-json
# cat /etc/ulogd.conf
# Example configuration for ulogd
# Adapted to Debian by Achilleas Kotsis <[email protected]>
[global]
######################################################################
# GLOBAL OPTIONS
######################################################################
# logfile for status messages
logfile="syslog"
# loglevel: debug(1), info(3), notice(5), error(7) or fatal(8) (default 5)
loglevel=3
######################################################################
# PLUGIN OPTIONS
######################################################################
# We have to configure and load all the plugins we want to use
# general rules:
# 1. load the plugins _first_ from the global section
# 2. options for each plugin in seperate section below
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_inppkt_NFLOG.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_inppkt_ULOG.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_inppkt_UNIXSOCK.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_inpflow_NFCT.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_filter_IFINDEX.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_filter_IP2STR.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_filter_IP2BIN.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_filter_IP2HBIN.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_filter_PRINTPKT.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_filter_HWHDR.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_filter_PRINTFLOW.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_filter_MARK.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_LOGEMU.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_SYSLOG.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_XML.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_SQLITE3.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_GPRINT.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_NACCT.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_PCAP.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_PGSQL.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_MYSQL.so"
#plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_DBI.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_raw2packet_BASE.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_inpflow_NFACCT.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_GRAPHITE.so"
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_JSON.so"
# this is a stack for flow-based logging via LOGEMU
stack=ct1:NFCT,ip2str1:IP2STR,print1:PRINTFLOW,emu1:LOGEMU
stack=ct2:NFCT,ip2str1:IP2STR,json1:JSON
[ct1]
#netlink_socket_buffer_size=217088
#netlink_socket_buffer_maxsize=1085440
#netlink_resync_timeout=60 # seconds to wait to perform resynchronization
#pollinterval=10 # use poll-based logging instead of event-driven
# If pollinterval is not set, NFCT plugin will work in event mode
# In this case, you can use the following filters on events:
#accept_src_filter=192.168.1.0/24,1:2::/64 # source ip of connection must belong to these networks
#accept_dst_filter=192.168.1.0/24 # destination ip of connection must belong to these networks
#accept_proto_filter=tcp,sctp # layer 4 proto of connections
event_mask=0x00000001
hash_enable=0
[emu1]
file="/var/log/ulog/syslogemu.log"
sync=1
[ct2]
hash_enable=0
event_mask=0x00000004
#accept_src_filter=10.211.55.0/24
[emu2]
file="/var/log/ulog/syslogemu2.log"
sync=1
[json1]
sync=1
file="/var/log/ulog/ulogd.json"
要在neutron-gateway的namespace下運行ulogd服務:
sudo ip netns exec qrouter-82ff3e71-e021-4ce2-aa92-5ec6cb6d7c3d /usr/sbin/ulogd -d -p /run/ulogd-test.pid -u ulog
然後創建一個虛機, 並分配浮動IP, 然後ping它, 觀察到日誌如下:
$ ping 10.5.150.82
PING 10.5.150.82 (10.5.150.82) 56(84) bytes of data.
64 bytes from 10.5.150.82: icmp_seq=1 ttl=63 time=5.71 ms
$ tail -f /var/log/ulog/syslogemu.log
Aug 14 05:38:20 juju-da14dc-stein-5 [NEW] ORIG: SRC=10.5.0.8 DST=10.5.150.82 PROTO=ICMP TYPE=0 CODE=8 PKTS=0 BYTES=0 , REPLY: SRC=192.168.21.73 DST=10.5.0.8 PROTO=ICMP TYPE=0 CODE=8 PKTS=0 BYTES=0
或者json格式:
# tail -f /var/log/ulog/ulogd.json
{"timestamp": "2019-08-14T06:10:55", "dvc": "Netfilter", "orig.ip.protocol": 17, "orig.l4.sport": 34308, "orig.l4.dport": 123, "orig.raw.pktlen": 0, "orig.raw.pktcount": 0, "reply.ip.protocol": 17, "reply.l4.sport": 123, "reply.l4.dport": 34308, "reply.raw.pktlen": 0, "reply.raw.pktcount": 0, "ct.mark": 0, "ct.id": 2009647936, "ct.event": 4, "flow.end.sec": 1565763055, "flow.end.usec": 77990, "oob.family": 2, "oob.protocol": 0, "src_ip": "10.5.0.25", "dest_ip": "91.189.94.4", "reply.ip.saddr.str": "91.189.94.4", "reply.ip.daddr.str": "10.5.0.25"}
其中配置文件中的event_mask的含義是 (一個更完整的例子見: https://gist.github.com/chagridsada/20fde58e49205af0d33a ):
NF_NETLINK_CONNTRACK_NEW: 0×00000001
NF_NETLINK_CONNTRACK_UPDATE: 0×00000002
NF_NETLINK_CONNTRACK_DESTROY: 0×00000004
其了上面的用法, 還可以走廣播的方式, 如下面的log1的配置使用了廣播組32, 那麼需要在iptables中通過’–nflog-group’參數指明( iptables --nflog-group)
plugin="/usr/lib/x86_64-linux-gnu/ulogd/ulogd_output_LOGEMU.so"
stack=log1:NFLOG,base1:BASE,ifi1:IFINDEX,ip2str1:IP2STR,print1:PRINTPKT,emu1:LOGEMU
# Logging of system packet through NFLOG
[log1]
# netlink multicast group (the same as the iptables --nflog-group param)
# Group O is used by the kernel to log connection tracking invalid message
group=32
[emu1]
file="/var/log/iptables.log"
#sync=1
nc -v 10.0.8.203 8080 #10.0.8.203 is FIP of VM
iptables -I FORWARD -p tcp --dport 8080 -j ULOG --nflog-group 32 --log-prefix "dbg: FORWARD "
iptables -I neutron-openvswi-FORWARD -p tcp --dport 8080 -j ULOG --nflog-group 32 --log-prefix "dbg: n-o-FORWARD "
iptables -I neutron-openvswi-sg-chain -p tcp --dport 8080 -j ULOG --nflog-group 32 --log-prefix "dbg: n-o-sg-chain "
iptables -I neutron-openvswi-ic42f8a9b-6 -p tcp --dport 8080 -j ULOG --nflog-group 32 --log-prefix "dbg: n-o-ic42f8a9b-6 "
iptables -I neutron-openvswi-sg-fallback -p tcp --dport 8080 -j ULOG --nflog-group 32 --log-prefix "dbg: n-o-sg-fallback "
注意, 上面iptables中的ULOG也可以換成LOG和NFLOG, LOG打印日誌時使用printk耗時長, NFLOG走netlink內核與用戶態可以雙向, ULOG走netlink-log協議只能從內核往用戶態傳日誌.
ulog與neutron集成
ulog與neutron集成的話還得參考此網頁添加一個腳本:
cat /usr/local/sbin/ip_neutron_wrapper
#!/bin/bash
# This wrapper is used to ensure an instance of ulogd daemon is
# running on each router namespace that is created.
ULOGD=/usr/sbin/ulogd
if [ "$1" == "netns" -a "$2" == "add" ]
then
# A new namespace is created.
# Run the requested command.
ip $@
rc=$?
ns=$3
# Check if we are creating a router namespace
if [ -n "$ns" ] && expr match $ns qrouter- >& /dev/null
then
# Router namespace. Run ulogd
ip netns exec $ns $ULOGD -d -p /run/ulogd.$ns.pid -u ulog
fi
exit $rc
elif [ "$1" == "netns" -a "$2" == "delete" ]
then
# Namespace deleted, *first* kill ulogd, then remove the namespace
ns=$3
if [ -n "$ns" -a -f /run/ulogd.$ns.pid ]
then
kill $(cat /run/ulogd.$ns.pid)
fi
exec ip $@
else
exec ip $@
fi
then modify rootwrap.d/*.filter, replacing for all “IpFilter” commands
“ip” with “/usr/local/sbin/ip_neutron_wrapper” script,
這個腳本的作用是, 在neutron創建namespace時就確保在該namespace下啓動ulogd進程從而避免我們像上面一樣手動啓動ulogd進程.
附錄 - 其他方法 - sg-logging
Neutron中有這個特性, 缺點是會造成一定的MQ性能問題, 進而導致各組件的rpc_timeout. 如果heat與dhcp-agent一起用就問題更嚴重了, 因爲heat會併發創建很多port, 而dhcp-agent只能一個接一個處理port, 這樣造成dhcp-agent端的rpc-timeout, 而且造成failed port binding, 然後引發nova端的vif創建問題, 如果heat端再有anti-aninity等一些retry調試, 再來一批ports, 就惡性循環了. 另外, sg-logging目前也不支持SNAT/DNAT和LBaaS VIP等FIP到fixed-ip的映射.
Packet logging service is designed as a Neutron plug-in that captures network packets for relevant resources (e.g. security group or firewall group) when the registered events occur. see https://docs.openstack.org/neutron/rocky/admin/config-logging.html
#./generate-bundle.sh -s bionic -r stein --num-compute 13 --heat --neutron-sg-logging
#juju deploy ./b/openstack.yaml --overlay ./b/o/neutron-gateway.yaml --overlay ./b/o/heat.yaml --overlay ./b/o/neutron-sg-logging.yaml
./generate-bundle.sh -s bionic -r stein --num-compute 13
juju add-model stein
juju deploy ./b/openstack.yaml --overlay ./b/o/neutron-gateway.yaml
juju deploy heat
juju add-relation heat mysql
juju add-relation heat keystone
juju add-relation heat rabbitmq-server
./configure
#source ~/stsstack-bundles/openstack/novarc
#http_proxy=http://squid.internal:3128 wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
#openstack image create --disk-format=qcow2 --container-format=bare --public cirros --file ./cirros-0.4.0-x86_64-disk.img
#juju config neutron-openvswitch firewall-driver=openvswitch
juju config neutron-api enable-security-group-logging=True
#juju config neutron-api enable-firewall-group-logging=True
juju config neutron-openvswitch security-group-log-burst-limit 25
#juju config neutron-openvswitch security-group-log-rate-limit unlimited
openstack network loggable resources list
openstack network log create mylog --resource-type security_group --description "test" --event ALL
#openstack network log create mylog --resource-type security_group --resource sg1
#openstack network log create mylog --resource-type security_group --target portA
#openstack network log create mylog --resource-type security_group --target portB --resource sg1
openstack network log set --disable mylog
# fix error 'Quota exceeded for resources: ['security_group'].'
neutron quota-show
openstack quota set --secgroup-rules 10000 --secgroups 1000 5275d0bc16fe42c39b96469b9019f7ce
openstack stack list
openstack stack create -e green.test.nonprd.yaml -t juju_k8s_stack.yaml mystack
附件 - 其他方法 - ovs IPFIX
ovs可以將包以IPFIX輸出, 但可能只支持輸出ACCEPT的包, 不支持DROP的包.
#https://community.riverbed.com/s/article/DOC-5783
sudo apt install libfixbuf3 libfixbuf3-dev libpcap-dev
wget https://sourceforge.net/projects/libipfix/files/latest/download && tar -xf download
cd libipfix_110224 && ./configure && make && sudo make install
sudo ipfix_collector -p 4740
#juju config neutron-openvswitch ipfix-target=10.5.0.8:4740
ovs-vsctl set Bridge br-int ipfix=@i -- --id=@i create IPFIX targets=\"10.5.0.8:4740\" sampling=64 cache_active_timeout=60 cache_max_flows=128
ovs-vsctl list-br| xargs -l -I{} ovs-vsctl set IPFIX {} sampling=1
ovs-vsctl list Bridge br-int |grep ipfix
ovs-vsctl list ipfix <ipfix-id>
#ovs-vsctl clear Bridge br-int ipfix
for node in {0,1}; do juju ssh neutron-gateway/$node sudo 'ovs-vsctl list-br| xargs -l -I{} ovs-vsctl set IPFIX {} sampling=1'; done
for node in {0,1}; do juju ssh neutron-gateway/$node sudo 'ovs-vsctl list ipfix'; done
for node in {0..30}; do juju ssh nova-compute-kvm/$node sudo 'ovs-vsctl list-br| sudo xargs -l -I{} ovs-vsctl set IPFIX {} sampling=1'; done
for node in {0..30}; do juju ssh nova-compute-kvm/$node sudo 'ovs-vsctl list ipfix'; done
#Turn on IPFIX with juju
juju config neutron-gateway ipfix-target=172.30.140.47:5500
juju config neutron-openvswitch ipfix-target=172.30.140.47:5500
#Turn off ipfix with the command below if we need to revert
# juju config neutron-gateway --reset ipfix-target
# juju config neutron-openvswitch --reset ipfix-target
問題, 使用sg-logging後的慢
客戶說使用sg-logging後變慢, heat會報錯:
2019-11-11 23:58:25.577 1552647 ERROR heat.engine.resource ResourceInError: Went to status ERROR due to "Message: Build of instance b9056495-475d-46c4-8696-e04ca7ac1952 aborted: Failed to allocate the network(s), not rescheduling., Code: 500"
也會見到這種錯:
re-scheduled: Build of instance b9056495-475d-46c4-8696-e04ca7ac1952 was re-scheduled: Anti-affinity instance group policy was violated
建議了,
1, juju config neutron-api rpc-response-timeout=180
2, disable anti-affinity check by setting [filter_scheduler]/build_failure_weight_multiplier = 0
3, disable heartbeat by setting [oslo_messaging_rabbit]heartbeat_timeout_threshold=0
On all your rabbit nodes change "{heartbeat, 0}," to /etc/rabbitmq/rabbitmq.config as follows:
[
{rabbit, [
{collect_statistics_interval, 30000},
{tcp_listeners, [5672]},
{heartbeat, 0},
{cluster_partition_handling, autoheal}
]}
].
##Restarting rabbit each time. Note that restarting might take a little while since pending messages may need to be synced.
sudo systemctl restart rabbitmq-server
rabbitmq_ctl cluster_status
實際上最後查出, 慢的原因是因爲SG太多, 而非sg-logging
Neutron has 3 RPC mechanisms:
1, Plugin RPC, used for messaging beteen neutron -server process and various service agent processes.
2, Callback System, used for in-process communication between core resources and service components. eg: make vpn service aware of lifecycle events changes for network resource.
3, Messaging Callback System, used for inter-process between core resources and service agents. pls refer [1] for more details.
Topic name format is as follows:
neutron-vo-<resource_class_name>-<version>
There are 10 resources (QosPolicy, Trunk, SubPort, Port, Subnet, Network, SecurityGroup, SecurityGrouprule, Log, PortForwarding) according to - https://github.com/openstack/neutron/blob/stable/stein/neutron/api/rpc/callbacks/resources.py#L38
As for there are multiple queues with the same large number of messages, they seem to be related to SecurityGroup and SecurityGroupRule, not Log:
neutron-vo-SecurityGroupRule-1.0_fanout_d8e73717f6544f22ac38b05cd0adf924 410
neutron-vo-SecurityGroup-1.0_fanout_51ee6075715a4f2d831029def5eb8ead 168
#see the connection num from every clients
tshark -r xxx.pcap |grep AMQP |awk '{arr[$5]++}END{for (a in arr) print a, arr[a]}' |sort -n -k 2 -r
10.55.12.80 166682
10.55.12.62 33172
10.55.12.61 18538
https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1817877
randomize_allocation_candidates = true
openstack security group list | wc -l