Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 6.14.0
-
Component/s: None
-
Labels:None
Description
It seems that for servers launched in an Amazon VPC, at least one of the default interfaces does not "look like ec2" and thus none of the ec2 attributes are populated
[Thu, 27 Oct 2011 18:45:47 -0400] DEBUG: Using default interface for default ip and mac address [Thu, 27 Oct 2011 18:45:47 -0400] DEBUG: has_ec2_mac? == true [Thu, 27 Oct 2011 18:45:47 -0400] DEBUG: has_ec2_mac? == false [Thu, 27 Oct 2011 18:45:47 -0400] DEBUG: looks_like_ec2? == false
Issue Links
Activity
- All
- Comments
- History
- Activity
- Transitions Summary
I can't view the link from Jessica's comment: "This discussion is private or deleted."
Neither node.cloud nor node.ec2 is populated. Is there a work-around or bug-fix for this on the horizon? The plugin is trying to check for a special ARP MAC, but the VPC machines come up with unique ARP MACs rather than the specific EC2 version.
ohai::plugins::ec2.rb looks_like_ec2? has a short-circuit check to has_ec2_mac?. has_ec2_mac? checks for a specific value ( fe:ff:ff:ff:ff:ff ) of of the ARP interface as a pre-check for ec2 candidacy. Presumably this is done to avoid a time-consuming http timeout hitting the amazon ec2_meta_addr.
This check fails because Amazon VPC instances have unique ARP iface values based on the routing for the VPC subnet.
42 def looks_like_ec2? 43 # Try non-blocking connect so we don't "block" if. 44 # the Xen environment is *not* EC2 45 has_ec2_mac? && can_metadata_connect?(EC2_METADATA_ADDR,80) 46 end
30 def has_ec2_mac? 31 network[:interfaces].values.each do |iface| 32 unless iface[:arp].nil? 33 has_mac = iface[:arp].value?("fe:ff:ff:ff:ff:ff") 34 Ohai::Log.debug("has_ec2_mac? == true") 35 return true if has_mac 36 end 37 end 38 Ohai::Log.debug("has_ec2_mac? == false") 39 false 40 end
Example ARP information from a VPC configured amazon instance:
ubuntu@ip-10-7-203-210:~$ arp Address HWtype HWaddress Flags Mask Iface 10.7.203.1 ether 02:0f:44:c0:00:02 C eth0
This bit Darrin and I today, so it has our votes and eyes. Here are some additional diagnostics:
aj@haproxy1:~$ ohai -l debug|grep ec2
[Mon, 30 Jan 2012 23:49:11 +0000] DEBUG: 2.6.32-341-ec2
[Mon, 30 Jan 2012 23:49:11 +0000] DEBUG: Loading plugin ec2
[Mon, 30 Jan 2012 23:49:11 +0000] DEBUG: has_ec2_mac? == true
[Mon, 30 Jan 2012 23:49:11 +0000] DEBUG: has_ec2_mac? == false
[Mon, 30 Jan 2012 23:49:11 +0000] DEBUG: looks_like_ec2? == false
"os_version": "2.6.32-341-ec2",
"release": "2.6.32-341-ec2",
aj@haproxy1:~$ curl 169.254.169.254/latest/meta-data/ ami-id ami-launch-index ami-manifest-path block-device-mapping/ hostname instance-action instance-id instance-type kernel-id local-hostname local-ipv4 mac metrics/ network/ placement/ profile public-ipv4 public-keys/ reservation-id security-groups
http://help.opscode.com/discussions/problems/1098-ohai-not-detecting-ec2-node
This discussion has some suggestions for modifying the ohai cookbook locally.
This just bit me so it gets my vote to be fixed.
I would just remove the network ARP check because that just won't ever work in VPC reliably. The other check will pass and we will get EC2 data, hurrah.
Going along with Mitchell's suggestion and the Opscode help discussion, is this an acceptable patch to ohai?
https://github.com/hectcastro/ohai/commit/1cc5d0524afed98e27651e95fd4ad9926165e39d
Debug output from a VPC node running Amazon Linux:
[ec2-user@ip-10-0-0-136 ohai]$ ohai -l debug | grep ec2 [Tue, 14 Feb 2012 19:28:28 +0000] DEBUG: Loading plugin ec2 [Tue, 14 Feb 2012 19:28:28 +0000] DEBUG: has_ec2_mac? == true [Tue, 14 Feb 2012 19:28:28 +0000] DEBUG: has_ec2_mac? == false [Tue, 14 Feb 2012 19:28:28 +0000] DEBUG: looks_like_ec2? == true "current_user": "ec2-user", "ec2": { "ec2-user": { "ec2-user" "ec2-user": { "dir": "/home/ec2-user", "provider": "ec2",
Debug output from a non-VPC node running Amazon Linux:
[ec2-user@ip-10-140-4-111 ohai]$ ohai -l debug | grep ec2 [Tue, 14 Feb 2012 20:00:29 +0000] DEBUG: ip-10-140-4-111.ec2.internal [Tue, 14 Feb 2012 20:00:30 +0000] DEBUG: Loading plugin ec2 [Tue, 14 Feb 2012 20:00:30 +0000] DEBUG: has_ec2_mac? == true [Tue, 14 Feb 2012 20:00:30 +0000] DEBUG: looks_like_ec2? == true "fqdn": "ip-10-140-4-111.ec2.internal", "provider": "ec2", "local_hostname": "ip-10-140-4-111.ec2.internal", "public_hostname": "ec2-174-129-87-247.compute-1.amazonaws.com", "domain": "ec2.internal", "current_user": "ec2-user", "ec2": { "hostname": "ip-10-140-4-111.ec2.internal", "local_hostname": "ip-10-140-4-111.ec2.internal", "public_hostname": "ec2-174-129-87-247.compute-1.amazonaws.com", "ec2-user": { "ec2-user" "ec2-user": { "dir": "/home/ec2-user",
Hector,
I think that the connection to the EC2 metadata server is the better check, so I recommended just removing the has_ec2_mac? entirely.
A simple "can I connect" check should be pretty quick.
Mitchell
The comment implied there was something slow about this check for XEN instances?
The post I linked to (http://tickets.opscode.com/browse/OHAI-310?focusedCommentId=21818&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21818) makes the patch overly complicated by trying to override ec2.rb with the new copy, hence the bit about changing << to unshift. Renaming the plugin, e.g. ec2-local.rb keeps them from colliding ( the file still "provides ec2").
- get ohai cookbook
- download ec2.rb into ec2-local.rb
- modify ec2-local.rb as per hector's patch : https://github.com/hectcastro/ohai/commit/1cc5d0524afed98e27651e95fd4ad9926165e39d
- add /etc/chef/ohai-plugins to /etc/chef/client.rb (automatic if client.rb is managed by the chef-client cookbook).
This is not a smooth or simple work around, but it does work and I've been launching VPC instances all day.
Of course our desire to commoditize cloud platforms is making determining cloud environments a very difficult problem and we're only just starting to see the cloud providers think about providing a mechanism for us. MAC addresses are incredibly fallible indicators, as we saw on OHAI-267 where we found that the MAC address that was assumed to be unique to Rackspace was actually a very common Cisco HSRP address. The EC2 plugin has used two step identification to first suspect that we're on EC2, and then confirm it through the presence of the metadata server.
This first check ensures there is a decent chance that we are on an EC2 system before making the HTTP call to the metadata server. If we aren't in EC2 and we make this call, we're going to have to wait two seconds for the timeout to elapse. We can't reduce the timeout because if we were in EC2, it isn't unreasonable that we could have high latency now and then on the network, local system, or the metadata service itself, and we wouldn't want to suddenly proclaim that we are no longer in EC2.
- Ohai 0.6.10 on my VM
real 0m1.893s
user 0m0.910s
sys 0m0.650s
- Ohai 0.6.10 on my VM with Hector's patch
real 0m3.720s
user 0m0.740s
sys 0m0.670s
This one plugin would consequently double the run time of Ohai everywhere that wasn't on EC2.
Bryan,
Completely fair. Perhaps, then, the question is: perhaps there should be a feature on Ohai to FORCE a specific plugin to run? Surely the user of Ohai knows better than Ohai what environment we're in. ![]()
Mitchell
By using a syn scanner I can determine that port 80 is responsive on 169.254.169.254 in about 0.10 seconds. Latency to 169.254.169.254 in my test client VPC subnet is 0.00015s~
Perhaps we could implement a syn scanner for port 169.254.169.254:80, instead of doing the non-blocking-connect regardless. Instead of just scanning, the full connect() in the second phase of syn scanner could just connect and grab the data fo'realz.
Starting Nmap 5.00 ( http://nmap.org ) at 2012-02-16 23:46 UTC
NSE: Loaded 0 scripts for scanning.
Initiating Ping Scan at 23:46
Scanning 169.254.169.254 [4 ports]
Completed Ping Scan at 23:46, 0.00s elapsed (1 total hosts)
Initiating SYN Stealth Scan at 23:46
Scanning 169.254.169.254 [1 port]
Discovered open port 80/tcp on 169.254.169.254
Completed SYN Stealth Scan at 23:46, 0.00s elapsed (1 total ports)
Host 169.254.169.254 is up (0.00014s latency).
Scanned at 2012-02-16 23:46:45 UTC for 0s
Interesting ports on 169.254.169.254:
PORT STATE SERVICE
80/tcp open http
Read data files from: /usr/share/nmap
Nmap done: 1 IP address (1 host up) scanned in 0.11 seconds
Raw packets sent: 5 (196B) | Rcvd: 2 (84B)
edit: thinking about this again, it's probably hard to implement a synscanner in ruby w/o pcap or something.
I built a prototype standalone syn scanner using packetfu, pcaprub and rex (mostly all stolen from metasploit framework). It can determine whether the API is responsive quite quickly – probably the fastest scan method available. Requires root privileges and libpcap-dev.
aj@ec2-vpn:~/fujin-synscanner-10d2f74$ time sudo bin/syn TCP OPEN 169.254.169.254:80 0.100000 0.060000 0.170000 ( 0.160949) real 0m1.155s user 0m0.980s sys 0m0.190s
https://github.com/fujin/synscanner/blob/master/lib/syn/scanner.rb
Maybe a generic tcp port scanner would be useful for ohai? how many other services do we have to connect to, only if they are up? thoughts?
I think timeouts are still a concern with even a syn scan. What happens if this packet gets dropped or it takes a second for it to come back? It can be relatively disastrous when an EC2 node suddenly believes it has been removed from EC2 and makes configuration changes accordingly for that one run.
We've been discussing this and our ideas so far are:
(Are we on Linux && are we a Xen Guest) || Are we on Windows?
Do a DNS lookup on the IP address of eth0, does it end with 'compute-1.internal'?
We must be on EC2, look at the metadata server
I don't have a VPC node handy, so I wonder if the first two steps are true there.
Also, I haven't tested outside of us-east, and I have a hint in my memory about DNS being different in some EC2 zones.
If we're willing to assume that DNS should either always fail immediately (no DNS or blocked, our resolver should not misbehave) or succeed quickly (DNS is almost always local, or should always be local) than we can use this approach.
This may be useful for Rackspace cloud as well, although we usually have the problem there with differentiating between Rackspace Cloud and Rackspace managed hosting, which we fourtunately do not have with AWS.
The IP addresses in VPC subnets have no reverse DNS.
aj@ec2-vpn:~$ ip addr list eth0|grep inet
inet 10.0.0.200/24 brd 10.0.0.255 scope global eth0
inet6 fe80::d5:7ff:feb3:ac99/64 scope link
aj@ec2-vpn:~$ cat /etc/resolv.conf
nameserver 10.0.0.2
aj@ec2-vpn:~$ dig 10.0.0.200 +short
HTH (unfortunately it doesn't)!
Lacking a definitive method for determining the cloud type on the host itself, the hint system developed in OHAI-267 will be our solution.
With Ohai 6.4.0, create /etc/chef/ohai/hints/ec2.json to enable EC2 attribute collection.
Related Ticket: http://help.opscode.com/discussions/problems/859-nodeec2-attribute-not-automatically-being-set-in-hosted-opscode