Compare Products

Hide

Clear All

VS

Time: January 6th, 2025
Artificial Intelligence Generated Content (AIGC) technology leverages advanced generative models to produce a wide range of content, including natural language text, images, and audio. Within the architecture of AIGC networks, the Network Interface Card (NIC) serves a vital function as the primary device that facilitates connectivity between computers and networks. The primary responsibilities of the NIC include the efficient transmission of data generated by the computer to the network, as well as the receipt of incoming data. NIC devices are essential in guaranteeing the high performance and reliability of AIGC networks, thereby providing a robust foundational support system for data transmission and network connectivity.


Reasons for the dual uplink of the network card

Dual NIC bonding is a network architecture that establishes simultaneous connections between two physical network interface cards (NICs) of a server or network device and distinct upper-level devices or switches.



In traditional single network card architecture, interruptions in AIGC training tasks can occur due to failures in either fibre optic connections or switches. Such interruptions can result in increased training costs and adversely affect customer branding. Furthermore, during switch upgrades, it is necessary to migrate AIGC training operations in advance, which can pose significant challenges to user experience, system stability, and network operation and maintenance.

Conversely, the dual uplink architecture for network cards enhances reliability by connecting the two ports of all network cards on the server to separate switches. By binding these two ports to form a bonded port, this architecture ensures seamless service provision. Consequently, in the event of a failure of one uplink link or its corresponding access layer switch, traffic can be rerouted to the alternate port, thereby preventing disruptions to training tasks.



The dual upper-link architecture design mitigates the risk of a single point of failure associated with connecting the network card to a singular switch, thereby significantly enhancing the robustness of the overall system interconnection. This design also facilitates the hot upgrade of switches within cluster systems, thereby improving the convenience of network operation and maintenance as well as the processes of functional iterations


Network Card Dual Uplink Architecture Network Solution

The following outlines several dual uplink architecture solutions for network interface cards that are currently supported by the existing switch infrastructure:

Dual IP Network Card
In this configuration, each port of the network interface card is assigned two distinct IP addresses, effectively dispersing traffic through different pathways according to the network card's configuration. This setup allows the network card to be virtually represented as two separate network cards, thereby leveraging the mature IP forwarding capabilities of switches. In the event of a failure of one port or IP address, the alternative port or IP address remains operational. The dual IP configuration for network cards is a versatile and efficient networking solution, applicable across various settings. However, it is important to note that certain collective communication libraries provide inadequate support for dual IP configurations, potentially resulting in performance degradation, particularly with multiple Queue Pairs (QPs). Furthermore, this solution necessitates the allocation of double the number of IP addresses for each network interface card, which could lead to inefficient use of IP address resources.


● Stack
The de-stacking solution is an innovative approach introduced by our organization. This solution involves the integration of network interface cards and switches to establish an aggregation port. On the network interface card side, ARP/ND messages are broadcast simultaneously by both ports, enabling the two connected switches to learn the network card's ARP/ND information concurrently. The connected switches subsequently convert these ARP/ND entries into Border Gateway Protocol (BGP) routes, which are then disseminated to other devices.
The stacking approach allows the method of business access to remain unchanged, eliminating the necessity for a physical connection between the two switches while effectively accommodating dual network card uplink access.


● Stacking + Dual Planes
The stacking combined with dual planes solution builds upon the stacking concept by partitioning the switch into distinct forwarding planes. Each dual uplink port of the network interface card is allocated to different network planes, meaning that the two ports connect to separate switches, each linked to different planes.

By employing the de-stacking and dual-plane configuration, the network card's sending end can ensure even traffic distribution across the two ports, leading to a balanced flow of network traffic at the receiving side's access layer switch. This notably diminishes the likelihood of hash polarization.

Furthermore, this design of dual uplink and dual-plane access effectively increases the maximum expansion scale of a single cluster within a two-layer CLOS network, yielding advantages such as simplified overall cluster communication topology, reduced latency, and lower operational costs.


● M-LAG
Multi-chassis Link Aggregation Group (M-LAG) represents a cross-device link aggregation technology that creates an active-active configuration through the interconnection of two devices. The control planes of these devices operate independently, while communication and synchronization occur via a peer link. Collectively, the two devices in the M-LAG group present themselves as a single device to downstream counterparts, with connections established through aggregated ports.

The M-LAG architecture provides high reliability; in the event of switch or link failures, traffic is automatically rerouted to functioning links or switches, ensuring network reliability and redundancy. Additionally, this architecture simplifies network topology, allowing multiple physical links to be viewed as a single logical link, thus streamlining network configuration and management. However, it is essential to note that M-LAG is a proprietary implementation, necessitating that the devices within the same M-LAG group originate from the same vendor, although cross-group M-LAG devices face no such restriction.


● E-AP
Enhanced Aggregation Port (E-AP) is another cross-device link aggregation technology. This system comprises multiple independent devices that support link aggregation and collectively function as a single device, facilitating link aggregation with downstream devices. This configuration enhances link reliability at the device level and addresses the demands of high-availability scenarios. In instances of link or device failures, E-AP automatically redirects data services to alternative links or devices within the E-AP group, thereby achieving device-level reliability.

E-AP operates through a proprietary protocol and supports the dual uplink architecture of network interface cards without necessitating additional interconnection links between devices. The E-AP design also features high reliability, allows for redundancy in physical links, and minimizes the risks associated with single-point failures.


● VXLAN Multi-Homing
VXLAN Multi-Homing refers to a configuration within a VXLAN network where a single VXLAN instance (typically a tenant or virtual network) connects to the VXLAN framework via multiple physical network interfaces or various network pathways, all regarded as a single EVI access. The entries of the VXLAN instance are synchronized utilizing BGP EVPN. By employing VXLAN Multi-Homing technology, organizations can achieve dual connections from the network interface card to different switches, thereby enhancing network resilience and performance.



Solution Comparison

We conducted a comprehensive evaluation of various solutions, considering multiple dimensions, including resource utilization, the scalability of supported network cards, traffic distribution, deployment complexity, and operational challenges. Each solution presents distinct advantages and limitations; therefore, the selection should be made with careful consideration of specific requirements and available resources.

/
Dual IP network card
Stack
Stacking + Dual plane
MLAG
E-AP
VXLAN
Multi-homing
Recommended Index
3.5
4.5
5
4
4.5
4.5
Resource Occupancy
2
5
5
4
5
5
Network Card Support Scale
4.5
4
5
4
4
4
Traffic Balance
3.5
4.5
5
4.5
5
5
Deployment Difficulty
5
4.5
4.5
4.5
4.5
3.5
Difficulty in Operation and Maintenance
4
5
5
4.5
4.5
4.5
Advantages of the Solution
Simple implementation reusing existing capabilities of network cards and switches.
The mature solution, with an unchanged business access method, improves the reliability of network architecture and eliminates the need for synchronization of MAC/ARP/ND table entries between devices.
Introduce dual-plane behaviour based on stacking. The number of network cards can support double the scale input while greatly reducing the probability of hash polarization.
Enhance network reliability, virtualize as a single switch, and simplify network operation and maintenance.
No need to occupy additional ports, high reliability, support redundant backup of physical links, and reduce the risk of a single point of failure.
Provide redundant connections for VXLAN networks with high flexibility and scalability.
Low requirements for operational and server capabilities.
Disadvantages of the Solution
Additional IP address allocation is required, but there is insufficient balance on some network card sides.
Needs to modify the Linux server to support ARP/ND broadcast packet forwarding.
Needs to modify the Linux server to support ARP/ND broadcast packet forwarding.
The interconnection link needs to occupy a port.
Private communication protocol unable to form E-AP group with other vendors' products.
Deploy VXLAN network to support.

Selecting the most appropriate dual-link solution for a network card necessitates a thorough analysis of both the current and anticipated requirements of the network architecture, alongside an assessment of the implications for performance, reliability, and cost. By systematically evaluating the strengths and weaknesses of each option, one can ascertain the most effective solution. AIGC strives to develop a network that exemplifies efficiency and reliability.

Ruijie Networks, as a comprehensive service provider in the era of Generative AI, is dedicated to delivering a full spectrum of products and solutions that span from Infrastructure as a Service (IaaS) to Platform as a Service (PaaS). Our offerings include high-performance networking and optimized scheduling for GPU computing, intending to enable customers to achieve significant productivity advancements and optimize operational expenditures through innovative technological solutions. We are confident that our endeavours will contribute to the realization of a more intelligent, efficient, and dependable future for our clients. Together, let us explore the myriad opportunities presented by the era of Generative AI.



Related Blogs:
Exploration of Data Center Automated Operation and Maintenance Technology: Zero Configuration of Switches
Technology Feast | How to De-Stack Data Center Network Architecture
Technology Feast | A Brief Discussion on 100G Optical Modules in Data Centers

Research on the Application of Equal Cost Multi-Path (ECMP) Technology in Data Center Networks
Technology Feast | How to build a lossless network for RDMA
Technology Feast | Distributed VXLAN Implementation Solution Based on EVPN
Exploration of Data Center Automated Operation and Maintenance Technology: NETCONF
Technical Feast | A Brief Analysis of MMU Waterline Settings in RDMA Network
Technology Feast | Internet Data Center Network 25G Network Architecture Design
Technology Feast | The "Giant Sword" of Data Center Network Operation and Maintenance
Technology Feast: Routing Protocol Selection for Large Data Centre Networks
Technology Feast | BGP Routing Protocol Planning for Large Data Centres
Technology Feast | Talk about the next generation 25G/100G data centre network
Technology Feast | Ruijie Data Center Switch ACL Service TCAM Resource Evaluation Guide

Silicon Photonics Illuminates the Path to Sustainable Development for Data Centre Networks
How CXL Technology Solves Memory Problems in Data Centres (Part 1)
CXL 3.0: Solving New Memory Problems in Data Centres (Part 2)
Ruijie RALB Technology: Revolutionizing Data Center Network Congestion with Advanced Load Balancing
Multi-Tenant Isolation Technology in AIGC Networks—Data Security and Performance Stability

Ruijie Networks websites use cookies to deliver and improve the website experience.

See our cookie policy for further details on how we use cookies and how to change your cookie settings.

Cookie Manager

When you visit any website, the website will store or retrieve the information on your browser. This process is mostly in the form of cookies. Such information may involve your personal information, preferences or equipment, and is mainly used to enable the website to provide services in accordance with your expectations. Such information usually does not directly identify your personal information, but it can provide you with a more personalized network experience. We fully respect your privacy, so you can choose not to allow certain types of cookies. You only need to click on the names of different cookie categories to learn more and change the default settings. However, blocking certain types of cookies may affect your website experience and the services we can provide you.

  • Performance cookies

    Through this type of cookie, we can count website visits and traffic sources in order to evaluate and improve the performance of our website. This type of cookie can also help us understand the popularity of the page and the activity of visitors on the site. All information collected by such cookies will be aggregated to ensure the anonymity of the information. If you do not allow such cookies, we will have no way of knowing when you visited our website, and we will not be able to monitor website performance.

  • Essential cookies

    This type of cookie is necessary for the normal operation of the website and cannot be turned off in our system. Usually, they are only set for the actions you do, which are equivalent to service requests, such as setting your privacy preferences, logging in, or filling out forms. You can set your browser to block or remind you of such cookies, but certain functions of the website will not be available. Such cookies do not store any personally identifiable information.

Accept All

View Cookie Policy Details

Fale conosco

Fale conosco

How can we help you?

Fale conosco

Get an Order help

Fale conosco

Get a tech support