Network Engineering

565 readers
1 users here now

All things enterprise network engineering, design, and architecture.

Rules

  1. No low effort posts
  2. No home networking topics
  3. No memes

founded 1 year ago
MODERATORS
1
 
 

Consider a Ping Request packet arriving on a computer with 2 NICs (multi-homed PC). The packet is received on 1 of the interfaces. Now the computer has to send the Ping Response packet. To fill the source IP and source MAC address the computer does which of the following?

  • Computer first determines which interface should be used as the egress interface by looking at the Destination IP address. Destination IP address was taken from source IP address field of Ping Request packet. Once it determines egress port, it will enter that interface's IP and MAC address in the Ping Response packet.
  • Computer takes the destination IP and MAC address of the Ping Request packet and just flips them over to fill source IP and MAC address in Ping Response packet.
2
 
 

I've seen companies do all sorts of home grown things.

One uses a spreadsheet that is just the configuration row by row, they turn it I to text file and copy to startup, reload.

I have used git servers to do the same thing, but with obvious change tracking history of git.

What real or home grown things are you using?

3
 
 

Currently using an ISR4461x. Now 17.7+ supports ssl VPN.

Should we learn flexvpn or do ssl VPN?

4
 
 

So, every network engineer knows it: everyone else will blame the network and you have to prove them wrong.

There are multiple reason:

  • lack of knowledge
  • ignorance
  • passing on responsibility
  • laziness
  • ... There are more.

I am interested in how you react to 'The network is causing the problems' requests.

  • do you request certain information?
  • need an explanation?
  • what are you first steps?
  • do you have a runbook or some policy in place?

Without getting into too much detail, I request some or all of the following information before I start looking:

  • what are they trying to do? What is the desired outcome?
  • what is the error message? *(pref a screenshot!) *+ timestamp (for logs)
  • has it ever worked before?
  • since when isn't it working?
  • can you resolve domains?
  • Source Host > Destination Host:Port
  • Results of Ping + Powershell Test-NetConnection on Windows and Netcat on Linux (to test general connection, assuming TCP connection)

What I ask for and in what order depends on the person I am talking to. By the way, monitoring is my friend. If it says everything is fine, it usually is.

Side note Describing the actual proof that it is not the network depends heavily on the infrastructure and the problem, so this may be a discussion for another thread.


What are your first steps?

5
 
 

Thanks to Jerry for bringing this community back to life. I'll be playing moderator for a while and may tweak the design a bit.

Enjoy!

6
 
 

I am interested in your ways to identify a bottleneck within a network.

In my case, I've got 2 locations, one in UK, one in Germany. Hardware is Fortigates for FW/routing and switches are Cisco/HPE. Locations are connected through an Ipsec VPN over the internet and all internet connections have at least a bandwidth of 100 Mbps.

The problem occurs as soon as one client in UK tries to download data via SSH from a server in Germany. The max download speed is 10 Mbps and for the duration of the download the whole location in UK has problems accessing resources through the VPN in Germany (Citrix, Exchange, Sharepoint, etc).

I've changed some information for privacy reasons but I'd be interested in your first steps on how to tackle such a problem. Do you have some kind of runbook that you follow? What are common errors that your encounter? (independently from my case too, just in general)

EDIT: Current list

  • packet capture on client and server to check for packet loss, latency, etc. - if packets dropped, check intermediate devices
  • check utilization of intermediate devices (CPU, RAM, etc)
  • check throughput with different tools (ipfer3, nc, etc) and protocols (TCP, UDP, etc) and compare
  • check if traffic shaper/ QoS are in place
  • check ports intermediate devices for port speed mismatch
  • MTU/MSS mismatch
  • is the internet connection affected too, or just traffic through the VPN
  • Ipsec configuration
  • turn off security function of FW temporary and check if it is still reproducible
  • traceroute from A to B, any latency spikes?
  • check RTT, RWND, MSS/MTU, TTL via pcap, on the transferring client itself and reference client, without and while an active data transfer

Prob not related but noteworthy:

  • check I/O of server and client

I'll keep this list updated and appreciate further tips.


Update I had to postpone the session and will do the stress test on Monday or Tuesday evening. I'll update you as soon as I have the results.


Update2 So, I'll try to keep it short.

First iperf3 over TCP run (UK < DE) with same FW rules let me reproduce the problem. Max speed 10 Mbps, and DE < UK even slower, down to 1-2 Mbps. Pattern of the test implies an unreliable connection (short up to 30 Mbts, then 0, and so on). Traceroute shows same hops in both directions, no latency spikes, all good.

BUT ICMP and iperf3 over UDP runs show a packet loss of min 10% and up to 30% in both directions! Multiple speed tests to endpoints over the internet (UK>Internet) showed a download of 80 Mbts andupload of like 30 Mbts, which indicates a problem with the IPSec tunnel.

Some smaller things we've tried without any positive effect:

  • routing changes
  • disabling all security features for affected rule set
  • removed traffic shaper
  • Port speed/duplex negotiations are looking good
  • and some other things that I already forgot

Things we prepared:

  • We have opened some tickets at our ISPs to let them check it on their site > waiting for response
  • Set up smokeping to ping all provider/public/gw/ipsec endpoinrts/host IPs and see where packets could be dropped (server located in DE)
  • Planned a new session with an Fortigate expert to look in-depth into the IPSec configuration.

Need to do:

  • look through all packet captures (takes some time)
  • MSS/MTU missmatches / DF flags
  • further iperf3 tests with smaller/larger packet
  • double check ipsec configuration
  • QoS on Switches

I wish I had more time. I'll keep you updated


Update3 Most likely the last big update.

So, the actual infrastructure is a little bit more complex than I've described in this post, so nobody could have suggested tips for this case.

We think that we have found the problem, but we couldn't implement the fix yet since it requires some downtime, and I was on a business trip. We've got multiple locations in the UK that are connected to a third party (MLPS) where their internet breakout points are too. We've now got multiple IPSec tunnels that terminate on the same FW in Germany. The problem is that the third-party FW uses the same IP AND port for all IPSec tunnels too, which most likely causes all the issues. In short: only use one tunnel or change the GW on the German side.

Don't ask me why, please! - It is a cluster fuck, and the goal is to fix it in the future. One site had a large flat /16 network not long ago.

I might share a final update when we get the fix implemented.

7
 
 

I had the weirdest of a problem. Two computers communicating with each other over ping and TFTP works. When I boot one of them into U-boot (a bootloader that supports TFTP boot) it can’t ping not load tftp of the other machine complaining on ARP timeouts.

I swapped with a dumb switch - all works. Everything else (machines, cables) are the same. The managed switch is a Cisco switch and I have a serial console to it, but I’m not familiar with managing those switches - what feature is potentially blocking u-boot's arp packets?

I’ve double checked with tcpdump - the other machine never seer u-boot's arp packets, but does when the same board is booted into Linux. I’ve also checked Cisco's monitor event-trace arp continuous and it didn’t print any packets but it did say link status went from up to down to back up when I rebooted.

Is there some sort of Mac filter on Cisco switches?