Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
TECHNIQUE FOR FORWARDING AN INCOMING BYTE STREAM BETWEEN A FIRST WORKLOAD AND A SECOND WORKLOAD BY MEANS OF A VIRTUAL SWITCH
Document Type and Number:
WIPO Patent Application WO/2017/137093
Kind Code:
A1
Abstract:
A technique for forwarding an incoming byte stream between a first workload and a second workload by means of a virtual switch, vSwitch, in a cloud environment is disclosed, the first workload being local to the vSwitch. A method aspect (in the vSwitch), comprises the steps of receiving the incoming byte stream at a first socket, determining a forwarding state of the incoming byte stream, piping, if the forwarding state indicates forwarding to a second socket, the incoming byte stream from the first socket to the second socket, wherein the second workload is another local workload having opened the second socket, and packetizing, if the forwarding state indicates that the second workload is remote to the vSwitch, the byte stream into a plurality of packets. A further method aspect (in the first workload) comprises the steps of opening, by an application running on the first workload, the socket on the vSwitch, transmitting, to the vSwitch, a first socket handle for the corresponding socket and a respective network context of the corresponding socket.

Inventors:
KIS ZOLTÁN LAJOS (HU)
Application Number:
PCT/EP2016/053031
Publication Date:
August 17, 2017
Filing Date:
February 12, 2016
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
ERICSSON TELEFON AB L M (PUBL) (SE)
International Classes:
H04L29/06; G06F9/455; G06F9/54; H04L12/931
Foreign References:
US20090328073A12009-12-31
US20140282889A12014-09-18
Other References:
None
Attorney, Agent or Firm:
RÖTHINGER, Rainer (DE)
Download PDF:
Claims:
CSaims

1. A method for forwarding an incoming byte stream between a first workload (1000) and a second workload (3000) by means of a virtual switch, vSwitch (2000), the first workload being local to the vSwitch, in a cloud environment, the method being performed by the vSwitch and comprising the steps of:

receiving (S2-3) the incoming byte stream at a first socket (200031);

determining (S2-4) a forwarding state of the incoming byte stream;

piping (S2-5), if the forwarding state indicates forwarding to a second socket, the incoming byte stream from the first socket to the second socket, wherein the second workload is another local workload having opened the second socket; and packetizing (S2-6a), if the forwarding state indicates that the second workload is remote to the vSwitch, the byte stream into a plurality of packets.

2. The method of claim 1, wherein the method further comprises forwarding (S2- 6a), if the forwarding state indicates that the second workload is remote to the vSwitch, the plurality of packets to one of a regular pipeline and a network interface card, NIC (200, 20032, 20042).

3. The method of claim 2, wherein the packetizing comprises utilizing (S2-6a-l) an offload feature of the NIC hardware.

4. The method of claim 3, wherein the offload feature of the NIC hardware is generic segmentation offload, GSO.

5. The method of any one of claims 1 to 4, further comprising emulating (S2-7) packet usage, if a flow state of the incoming byte stream indicates that the incoming byte stream is to be inspected packet-by-packet.

6. The method of any one of claim 1, wherein, if the forwarding state indicates forwarding to a second socket, the second socket is opened such that the second socket is directly attached to the vSwitch, wherein packetization by a network stack is bypassed.

7. The method of claim 6, wherein bypassing packetization includes a dedicated channel for passing the non-packetized byte stream.

8. The method of any one of claims 1 to 7, wherein opening the first socket is performed by a first application (1005) running on the first workload and opening the second socket is performed by a second application (3005) running on the second workload, the method further comprising:

receiving (S2-1), from the first and second workloads, a respective first and socket handle for the corresponding socket; and

receiving (S2-2), from the first and second workloads, a respective network context of the corresponding socket.

9. The method of claim 8, further comprising:

monitoring (S2-2a) a network stack of the first workload for a change of the network context.

10. The method of claim 9, wherein the monitoring comprises monitoring updates of a corresponding routing table.

11. The method of any one of claims 8, 9 or 10, wherein the network context are source and destination layer 2 to layer 4 addresses associated with the socket.

12. The method according to any one of claims 8 to 11, wherein the determining step further comprises:

determining (S2-4-1) the forwarding state of the incoming byte stream based on the network context.

13. A method for forwarding an incoming byte stream between a first workload (1000) and a second workload (3000) by means of a virtual switch, vSwitch (2000), the first workload being local to the vSwitch, in a cloud environment, the method being performed by the first workload and comprising the steps of:

opening (Sl-0), by an application running on the first workload, a socket (200031, 200041) on the vSwitch,

transmitting (Sl-1), to the vSwitch, a first socket handle for the corresponding socket; and

transmitting (Sl-2), to the vSwitch, a respective network context of the corresponding socket.

14. The method of any one of claims 1 to 13, wherein at least one of the first workload and second workload is one of a virtual machine, VM, and a container.

15. The method of any one of claims 1 to 14, wherein at least one of the first workload and second workload is an entity being capable of running applications.

16. A computer program comprising program code portions for performing the method of any one of claims 1 to 15 when the computer program is executed on one or more computing devices.

17. The computer program of claim 16, stored on a computer readable recording medium.

18. A virtual switch, vSwitch (2000), for forwarding an incoming byte stream between a first workload (1000) and a second workload (3000), the first workload being local to the vSwitch, in a cloud environment, the vSwitch comprising:

a first socket (20041) configured to receive the incoming byte stream;

a module (20001, 20009) configured to determine a forwarding state of the incoming byte stream;

a module (20001, 20005) configured to pipe, if the forwarding state indicates forwarding to a second socket (200031), the incoming byte stream from the first socket to the second socket, wherein the second workload is another local workload having opened the second socket; and

a module (20001, 20007) configured to packetize, if the forwarding state indicates that the second workload is remote to the vSwitch, the byte stream into a plurality of packets.

19. A first workload (1000) for forwarding an incoming byte stream between the first workload and a second workload (3000) by means of a virtual switch, vSwitch (2000), the first workload being local to the vSwitch, in a cloud environment, the first workload comprising:

a module (10005) configured to open, by an application running on the first workload, a socket (200031, 20041) on the vSwitch; and

a module (10004) configured to:

- transmit, to the vSwitch, a first socket handle for the corresponding socket, and - transmit, to the vSwitch, a respective network context of the corresponding socket.

Description:
Technique for forwarding an incoming byte stream between a first workload and a second workload by means of a virtual switch

Technical Field

The present disclosure generally relates to forwarding an incoming byte stream between a first workload and a second workload by means of a virtual switch, vSwitch, the first workload being local to the vSwitch, in a cloud environment. The techniques of the present disclosure may be embodied in methods and/or

apparatuses.

Background

In cloud environments, a compute host is typically shared among multiple workloads. These workloads are packaged as virtual machines (VMs) or as containers.

Communication between local workloads and to remote workloads is provided by virtual switches (vSwitches).

Fig. 1 shows an exemplary prior art cloud environment 10, which comprises a host 100 and an associated network interface card (NIC) 200. In turn, the host 100 comprises a first workload 1000, a vSwitch 2000 and a second workload 3000, wherein the host is placed - from a logical point of view - on top of a network stack 101, which in turn is placed on top of a host operating system (OS) 102. In the present example, the first workload 1000 is embodied by a container, whereas the second workload 3000 is embodied by a VM. Further, the first workload 1000 comprises a first application 1005, and the second workload 3000 comprises a second application 3005. In addition, the second workload 3000 being a VM may comprise, logically on top of the network stack 101, a hypervisor 3008, on top of that a guest OS 3007, and on top of that a network stack 3006.

During operation, the vSwitch 2000 essentially emulates a packet switched network: the vSwitch 2000 forwards Ethernet frames between local workloads 1000, 3000 as well as between workloads and remote hosts via the NIC 200 when communicating to other hosts. Typical applications 1005, 3005 use the socket abstraction of the operating system (guest OS 3007 in VMs, host OS 102 in containers) for network communication. With sockets, an application 1005, 3005 can open a communication channel to remote addresses. The application 1005, 3005 can then send streams of bytes over the channel and the underlying network stack 101 will take care of packetization and forwarding of the data (as is shown with the curved dashed double-arrows in Fig. 1).

In the cloud environment 10, these packets will arrive at the vSwitch 2000. The vSwitch 2000 compares each incoming packet against its internal flow tables, and as a result (possibly modifies and) forwards the packet to another local workload, or to a remote host via the NIC 200.

There have been certain problems with existing solutions not hitherto realized, as will be detailed below.

The problem with the scheme depicted in Fig. 1 resides in that packetization happens too early, namely already at the workloads 1000, 3000. This results in an

unnecessary waste of resources, and also makes it impossible to exploit some hardware offload features of NICs 200.

Local communication

In the current cloud architecture 10, when two collocated workloads communicate, they do so based on sending / receiving packets.

1) The sending application 1005, 3005 opens a socket to the network address of the receiver and sends a byte stream to the socket (cf. e.g. the left-hand dashed double arrow in Fig. 1 between application 1005 and vSwitch 2000).

2) The network stack 101 packetizes this stream and forwards the packets to the vSwitch 2000.

3) The vSwitch 2000 in turn forwards these packets to the destination workload 3000 - based on the network address - one-by-one (cf. e.g. the right-hand dashed double arrow in Fig. 1 between vSwitch 2000 and application 3005).

4) The network stack 3006 of the receiving workload 3000 recombines the

packets to a byte stream.

5) The receiving application 3005 reads the stream of bytes on a socket. Communication in the opposite direction is performed in a similar fashion. In this scenario, the use of packets is completely unnecessary.

In this regard, the CPU and RAM resources spent by the workloads 1000, 3000 on packetizing and then reassembling the streams are completely wasted. Similarly, the vSwitch resources spent on packet-by-packet forwarding of the same stream is wasted, as in most of the cases only a simple go/no-go decision is needed for the complete stream. In this regard, as recent workload schedulers will favor collocation of communicating workloads (e.g., pods in Kubernetes), this scenario is likely to occur more often.

External communication

In the current architecture when a workload communicates with an external workload, the flow of information is as follows.

1) The sending application 1005, 3005 opens a socket to the network address of the receiver and sends a byte stream to the socket.

2) The network stack 101 packetizes this stream and forwards the packets to the vSwitch 2000.

3) The vSwitch 2000 forwards these packets to the NIC 200 - based on the

network address - one-by-one.

4) The NIC 200 sends the packets out on the network.

Also in this case, communication in the opposite direction happens in a similar fashion. Also in this scenario, the packetization of the stream happens too early, and so the hardware offload features of the NIC 200 (e.g., generic segment offload, GSO) cannot be exploited.

Summary

Accordingly, there is a need for an implementation of a scheme that avoids one or more of the problems discussed above, or other related problems.

In a first aspect, there is provided a method for forwarding an incoming byte stream between a first workload and a second workload by means of a virtual switch, vSwitch, the first workload being local to the vSwitch, in a cloud environment, the method being performed by the vSwitch and comprising the steps of receiving the incoming byte stream at a first socket; determining a forwarding state of the incoming byte stream; piping, if the forwarding state indicates forwarding to a second socket, the incoming byte stream from the first socket to the second socket, wherein the second workload is another local workload having opened the second socket; and packetizing, if the forwarding state indicates that the second workload is remote to the vSwitch, the byte stream into a plurality of packets. In this way, unnecessary packetization in case of host-internal traffic is avoided.

In a first refinement of the first aspect, the method may further comprise forwarding, if the forwarding state indicates that the second workload is remote to the vSwitch, the plurality of packets to one of a regular pipeline and a network interface card, NIC. In this case, the packetizing may comprise utilizing an offload feature of the NIC hardware. If so, the offload feature of the NIC hardware may be generic

segmentation offload, GSO. Alternatively, in addition or alternatively, the method may further comprise emulating packet usage, if a flow state of the incoming byte stream indicates that the incoming byte stream is to be inspected packet-by-packet. Thus, the advantages of the NIC's hardware features may be exploited in case of host-external traffic.

In a second refinement of the first aspect, if the forwarding state indicates

forwarding to a second socket, the second socket may be opened such that the second socket is directly attached to the vSwitch, wherein packetization by a network stack is bypassed. In this case, bypassing packetization may include a dedicated channel for passing the non-packetized byte stream. Accordingly, minimum effort is needed in case of host-internal traffic.

In a third refinement of the first aspect (involving also the first and second

refinements), opening the first socket may be performed by a first application running on the first workload and opening the second socket is performed by a second application running on the second workload, and the method may further comprise receiving, from the first and second workloads, a respective first and socket handle for the corresponding socket; and receiving, from the first and second workloads, a respective network context of the corresponding socket. In this case, the method may further comprise monitoring a network stack of the first workload for a change of the network context. If so, the monitoring may comprise monitoring updates of a corresponding routing table. Still further, the network context may be source and destination layer 2 to layer 4 addresses associated with the socket. In addition or alternatively, the determining step may further comprise determining the forwarding state of the incoming byte stream based on the network context. Thus, the socket structure is exploited for implementing the advantageous features of the present disclosure.

In a second aspect, there is provided a method for forwarding an incoming byte stream between a first workload and a second workload by means of a virtual switch, vSwitch, the first workload being local to the vSwitch, in a cloud environment, the method being performed by the first workload and comprising the steps of opening, by an application running on the first workload, a socket on the vSwitch,

transmitting, to the vSwitch, a first socket handle for the corresponding socket; and transmitting, to the vSwitch, a respective network context of the corresponding socket. The second aspect is the implementation counterpart to the first aspect, thus allowing for the same advantages as the first aspect.

In a fourth refinement of the first and second aspects, at least one of the first workload and second workload may be one of a virtual machine, VM, and a container. In addition or alternatively, at least one of the first workload and second workload may be an entity being capable of running applications.

In a third aspect, there is provided a computer program comprising program code portions for performing the first and/or second aspects, when the computer program is executed on one or more computing devices. Here, the computer program may be stored on a computer readable recording medium, particularly a non-transitory storage medium.

In a fourth aspect, there is provided a virtual switch, vSwitch, for forwarding an incoming byte stream between a first workload and a second workload, the first workload being local to the vSwitch, in a cloud environment, the vSwitch comprising a first socket configured to receive the incoming byte stream; a module configured to detect, from the incoming byte stream, the forwarding state of the incoming byte stream; a module configured to pipe, if the forwarding state indicates forwarding to a second socket, the incoming byte stream from the first socket to the second socket, wherein the second workload is another local workload having opened the second socket; and a module configured to packetize, if the forwarding state indicates that the second workload is remote to the vSwitch, the byte stream into a plurality of packets.

In a fifth aspect, there is provided a first workload for forwarding an incoming byte stream between the first workload and a second workload by means of a virtual switch, vSwitch, the first workload being local to the vSwitch, in a cloud environment, the first workload comprising a module configured to open, by an application running on the first workload, a socket on the vSwitch; and a module configured to transmit, to the vSwitch, a first socket handle for the corresponding socket, and transmit, to the vSwitch, a respective network context of the corresponding socket.

Still further, it is to be noted that the method aspects may also be embodied on the apparatus of the fourth and fifth aspects comprising at least one processor and/or appropriate means for carrying out any one of the method steps. Accordingly, the apparatus aspects may attain the same or similar advantages as the method aspects.

Brief Description of the Drawings

The embodiments of the technique presented herein are described herein below with reference to the accompanying drawings, in which:

Fig. 1 shows an exemplary prior art cloud environment, which comprises a host and an associated NIC;

Fig. 2 shows a principle of a communications network and components involved in which embodiments of the present disclosure can be performed;

Fig. 3 shows components comprised in an exemplary device embodiment realized in the form of an apparatus (which may reside e.g. in a vSwitch and/or a (first) workload);

Fig. 3A shows implementation details of the vSwitch; and

Fig. 4 shows a method embodiment which also reflects the interaction between the components of the apparatus embodiment. Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth (such as particular signalling steps) in order to provide a thorough understanding of the technique presented herein. It will be apparent to one skilled in the art that the present technique may be practiced in other embodiments that depart from these specific details. For example, the embodiments will primarily be described in the context of 3 rd generation (3G) or 4 th generation/long term evolution (4G/LTE); however, this does not rule out the use of the present technique in connection with (future) technologies consistent with 3G or 4G/LTE, be it a wire- bound communications network or a wireless communications network. In addition, the present disclosure may also be implemented, if applicable in a working form, in legacy devices.

Moreover, those skilled in the art will appreciate that the services, functions and steps explained herein may be implemented using software functioning in

conjunction with a programmed microprocessor, or using an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a field programmable gate array (FPGA) or general purpose computer. It will also be appreciated that while the following embodiments are described in the context of methods and devices, the technique presented herein may also be embodied in a computer program product as well as in a system comprising a computer processor and a memory coupled to the processor, wherein the memory is encoded with one or more programs that execute the services, functions and steps disclosed herein.

Without loss of generality, the present disclosure may be summarized as follows (this does not preclude that the embodiments described herein below may provide generalizations/broadenings with respect to the following summary):

• The idea is to modify the network environment 10 of the workloads 1000, 3000 so that the sockets opened by applications directly connect to a modified vSwitch 2000, and also provide their networking context (source/destination addresses). The vSwitch 2000 would decide, based on the forwarding state, the fate of the byte stream arriving on a socket.

• If the byte stream is to be forwarded to another socket (opened by another local workload 1000, 3000), the vSwitch 2000 pipes the byte stream from one socket to the other, thus avoiding packetization. • Otherwise, the switch can do the packetization in place and feed the packets to the original pipeline or to the NIC 200, exploiting the NIC hardware offload features.

• When the flow state requires the vSwitch 2000 to inspect the traffic packet- by- packet (e.g., stateful DPI, or just simple per-flow counters), the vSwitch 2000 can emulate the use of packets.

• As a fall-back solution, the vSwitch 2000 can do the packetization on its own when receiving bytes from a socket, and can feed these packets to its forwarding pipeline (in a transparent manner to the workloads).

• In other words, in the proposed solution, when a socket is opened by a

workload, it bypasses packetization by the network stack, and is directly attached to the vSwitch.

• The proposed changes consist of two parts: changes to the workload

environments and changes to the vSwitch.

Fig. 2 shows a principle of a communications network 10 in which embodiments of the present disclosure can be performed. The communications network (or

environment) 10 may comprise substantially the same or at least similar components as already shown in Fig. 1.

Fig. 3 shows components comprised in an exemplary device embodiment realized in the form of the first workload 1000, the vSwitch 2000 and the second workload 3000. As shown in Fig. 2, the first workload 1000 comprises a core functionality (e.g., one or more of a Central Processing Unit (CPU), dedicated circuitry and/or a software module) 10001, an optional memory (and/or database) 10002, a

transmitter 10003 and a receiver 10004. Moreover, the first workload 1000

comprises an opener 10005.

Further, the vSwitch 2000 comprises a core functionality (e.g., one or more of a Central Processing Unit (CPU), dedicated circuitry and/or a software module) 20001, an optional memory (and/or database) 20002, a transmitter 20003 and a receiver 20004. Moreover, the vSwitch 2000 may comprise a stream pipe 20005, a

reassembler 20006, a packetizer 20007, a pipeline 20008, a determiner 20009, a utilizer 200010, a monitor 200011 and an emulator 200012. As further shown in Figs. 3 and 3A, the vSwitch 2000 may comprise input sockets (denoted "Insock") 200041, an input side 200042 of the NIC 200, input ports (denoted "Inport") 200043, output sockets (denoted "Outsock") 200031, an output side 200032 of the NIC 200 and output ports (denoted "Outport") 200033. In this regard, the sockets 200041, 200031 may be considered an endpoint of a channel on the workload side (i.e., the socket is the interface towards the application). In turn, the channel connects the workload 1000, 3000 and the vSwitch 2000.

In addition, the second workload 3000 comprises a core functionality (e.g., one or more of a Central Processing Unit (CPU), dedicated circuitry and/or a software module) 30001, an optional memory (and/or database) 30002, a transmitter 30003 and a receiver 30004. Moreover, the second workload 3000 comprises an opener 30005.

As partly indicated by the dashed extensions of the functional blocks of the CPUs

20001, 20002, 30001, the opener 10005 (of the first workload 1000), the stream pipe 20005, the reassembler 20006, the packetizer 20007, the pipeline 20008, the determiner 20009, the utilizer 200010, the monitor 200011 and the emulator 200012 (of the vSwitch 2000) and the opener 30005 (of the second workload 3000) as well as the respective memories 10002, 20002, 30002, the respective transmitters 10003, 20003, 30003 and the respective receivers 10004, 20004, 30004 may at least partially be functionalities running on the CPUs 20001, 20002, 30001, or may alternatively be separate functional entities or means controlled by the CPUs 20001,

20002, 30001 and supplying the same with information. The transmitter and receiver components 10003, 20003, 30003, 10004, 20004, 30004 may be realized to comprise suitable interfaces and/or suitable signal generation and evaluation functions.

The CPUs 10001, 20001, 30001 may be configured, for example, using software residing in the memories 10002, 20002, 30002, to process various data inputs and to control the functions of the memories 10002, 20002, 30002, the transmitters 10003,

20003, 30003 and the receivers 10004, 20004, 30004 (as well as of the opener 10005 (of the first workload 1000), the stream pipe 20005, the reassembler 20006, the packetizer 20007, the pipeline 20008, the determiner 20009, the utilizer 200010 the monitor 200011 and the emulator 200012 (of the vSwitch 2000) and the opener 30005 (of the second workload 3000)). The memories 10002, 20002, 30002 may serve for storing program code for carrying out the methods according to the aspects disclosed herein, when executed by the CPU 10001, 20001, 30001. It is to be noted that the transmitters 10003, 20003, 30003 and the receivers 10004, 20004, 30004 may be provided as respective integral transceivers, as is indicated in Fig. 3. It is further to be noted that the transmitters/receivers 10003, 20003, 30003, 10004, 20004, 30004 may be implemented as physical transmitters/receivers for transceiving via an air interface or a wired connection, as routing/forwarding entities/interfaces between network elements, as functionalities for writing/reading information into/from a given memory area or as any suitable combination of the above. At least one of the opener 10005 (of the first workload 1000), the stream pipe 20005, the reassembler 20006, the packetizer 20007, the pipeline 20008, the determiner 20009, the utilizer 200010 the monitor 200011 and the emulator 200012 (of the vSwitch 2000) and the opener 30005 (of the second workload 3000), or the respective functionalities, may also be implemented as a chipset, module or subassembly.

Fig. 4 shows a method embodiment which also reflects the interaction between the components of the device embodiment. In the signalling diagram of Fig. 4, time aspects between signalling are reflected in the vertical arrangement of the signalling sequence as well as in the sequence numbers. It is to be noted that the time aspects indicated in Fig. 4 do not necessarily restrict any one of the method steps shown to the step sequence outlined in Fig. 4. This applies in particular to method steps that are functionally disjunctive with each other.

In a first step Sl-0, the opener 10005, 20005 of the respective workload opens, by an application 1005, 3005 running on the first workload 10005, 20005, a socket 200031, 200041 on the vSwitch 2000. Then, in step Sl-1, the respective transmitter 10003, 20003 of the first workload 1000, 3000 transmits, to the vSwitch 2000, a first socket handle for the corresponding socket 200031, 200041. Further, in step Sl-2, the respective transmitter 10003, 30003 of the first workload 1000, 3000 transmits, to the vSwitch 2000, a respective network context of the corresponding socket.

In this context, at least one of the first workload 1000 and second workload 3000 may be an entity being capable of running applications. Concerning the workloads 1000, 3000 in the present disclosure, the changes according to the disclosure should be transparent to applications 1005, 3005, so that the applications 1005, 3005 can be used substantially unmodified. In this regard, typical applications use the socket facilities (provided e.g. by the libc library) to establish network sockets for communication. When opening such a socket 200031, 200041, the original libc implementation then may contact the kernel networking stack 101 to set up the required structures, buffers, etc. As a result, the byte stream sent by the application 1005, 3005 may be packetized by the network stack 101 and packets are sent out on a (virtual) interface.

According to the disclosure, the libc implementation of the socket facilities may be modified. Accordingly, when an application 1005, 3005 opens a socket 200031, 200041, the application 1005, 3005 will communicate to the vSwitch 2000 the following:

1) The application 1005, 3005 provides the socket handle to the vSwitch 2000 (as described above for step Sl-1), so that byte streams can be exchanged by the applications and the vSwitch 2000.

2) Further, the application 1005, 3005 provides the network context of the socket 200031, 200041 (e.g. source and destination L2-L4 addresses associated with the socket 200031, 200041) to the vSwitch 2000. Furthermore, the monitor 200011 of the vSwitch may constantly monitor the workload's network stack 101 (e.g., using netlink facilities) and notify the vSwitch 2000 whenever the context is changed (e.g., by updates to the routing table).

Concerning the security aspect, the host OS 102 may ascertain that the libc implementation will only report valid network addresses that security policies enable the application to use.

Here, it must be noted that at least one of the first workload 1000 and second workload 1000 is one of a virtual machine, VM, and a container.

In case of containers, libc exists in the same context as the vSwitch 2000. Here, the implementation can use any method to communicate with the vSwitch 20000, e.g., sockets 200031, 200041 (as described above) or e.g. IPC channels.

In case of VMs, the libc may communicate with the guest OS 3007, whereas the vSwitch 2000 runs on the host OS 102. Here, the modified socket implementation may need to communicate via the hypervisor 3008. One possibility may reside in using hypercalls, another in using paravirtual drivers, such as vhost. Using the hypervisor 3008, it becomes possible to pass a handle from the guest socket to the host OS 102 and in turn to the vSwitch 2000 running on the host 100.

Turning to the vSwitch 2000, opening the first socket 200031, 200041 is performed by a first application 1005 running on the first workload 1000 and opening the second socket 200031, 200041 is performed by a second application 3005 running on the second workload. Accordingly, in an optional step S2-1, the receiver 20004 of the vSwitch 2000 receives, from the first and second workloads 1000, 3000, the respective first and socket handles for the corresponding socket 200031, 200041. Further, in an optional step S2-2a, the receiver 20004 of the vSwitch 2000 receives from the first and second workloads 1000, 3000, a respective network context of the corresponding socket 200031, 200041.

In the above case, as an optional step S2-2a, the monitor 200011 of the vSwitch 2000 monitors a network stack of the first workload for a change of the network context. Here, the monitoring may comprise monitoring updates of a corresponding routing table. Accordingly, the network context may be source and destination layer 2 to layer 4 addresses associated with the socket.

Still further, in step S2-3, the receiver 20004 of the vSwitch 2000 receives the incoming byte stream at a first socket 200031. That is, the components of the vSwitch 2000 may extended to handle sockets / byte streams (cf. Fig. 3A). The usage of sockets involves socket endpoints, which represent an endpoint to a socket. In turn, the socket may provide the socket handle to the vSwitch 2000, which handle can be used to poll for new bytes in the stream as well as to write new bytes to a stream. Additionally, each endpoint may maintain the current network context / state associated to the socket (as provided by the above-described socket

implementations). In essence, these may be the layer 2 (L2) to layer 4 (L4) source and destination addresses that would have been used by the network stack 101 for packetizing the data in the byte stream.

Accordingly, the determiner 20009 of the vSwitch 2000 determines a forwarding state of the incoming byte stream. In that case, in an optional step S2-4-1, the determiner 20009 of the vSwitch 2000 may determine the forwarding state of the incoming byte stream based on the network context. Then, in step S2-5, the stream pipe 20005 pipes, if the forwarding state indicates forwarding to a second socket 200041, the incoming byte stream from the first socket 200031 to the second socket 200041, wherein the second workload 3000 is another local workload having opened the second socket. Here, the stream pipe 20005 may be summarized as being responsible for receiving byte streams from sockets or the reassembler 20006, and pipe them to sockets or the packetizer 20007, based on the forwarding state of the switch.

Further, in step S2-6, the packetizer 20005 packetizes, if the forwarding state indicates that the second workload is remote to the vSwitch 200, the byte stream into a plurality of packets. In that case, in an optional step S2-6a, the socket 200031, 200041 (and possibly the NIC 200, 200032, 20042) may forward, if the forwarding state indicates that the second workload 3000 is remote to the vSwitch 2000, the plurality of packets to one of a regular pipeline and the NIC 200, 20032, 20042. In that case, in an optional step S2-6a-l, the utilizer 200010 of the vSwitch 2000 utilizes an offload feature of the NIC hardware, wherein the offload feature of the NIC hardware may be generic segmentation offload, GSO.

The operations performed by the packetizer 20007 may be summarized as follows: if a byte stream from a socket 200031, 200041 is to be forwarded to a port 200033, 200043, this component may split the stream and put the segments into packets, using the L2 to L4 addresses from the network context of the socket 200031, 200041. If the data is to be forwarded outside the host 100, the packetizer 20007 may also exploit the hardware offload features of the NIC 200, 200032, 200042. For example, if the NIC 200, 200032, 200042 supports GSO, the packetizer 20007 may set up the network context and forward the stream to the NIC 200, 200032, 200042 for packetization.

In contrast thereto, the operation of the reassembler 20006 may be summarized as follows: when packets are to be forwarded to a socket, this component may be responsible for removing the packet headers thus creating a stream of bytes, which are then fed to the stream pipe 20005. Furthermore, when the packets arrive from the NIC 200, 200032, 200042, this component may also exploit the hardware offload features described above. Finally, in an optional step S2-7, the emulator 200012 of the vSwitch 2000 emulates packet usage, if a flow state of the incoming byte stream indicates that the incoming byte stream is to be inspected packet-by-packet.

Still further, it is noted that if the forwarding state indicates forwarding to a second socket 200031, 200041, the second socket 200031, 200041 may be opened such that the second socket is directly attached to the vSwitch 2000, wherein

packetization by a network stack is bypassed. If so, bypassing packetization may include a dedicated channel for passing the non-packetized byte stream; in this context, as noted above, the sockets 200041, 200031 may be considered an endpoint of a channel on the workload side (i.e., the socket is the interface towards the application). In turn, the channel connects the workload 1000, 3000 and the vSwitch 2000.

In a non-restricting use case example, the behaviour of the first/second workloads 1000, 3000 and the vSwitch 2000 according to the present disclosure may be summarized as follows:

• When an application 1005, 3005 creates a socket 200031, 200041, the

modified libc component (as described above) may provide the socket handle to the vSwitch 2000, as well as the current network context. The vSwitch 2000 may then establish the appropriate objects in memory.

• When new data arrives in a stream, the stream pipe 20005 may compare the stream's network context information to the forwarding state. If the packet is to be forwarded to a destination address, the stream pipe 20005 may check whether any established socket 200031, 200041 has that address. If so, bytes may be piped to that socket. Otherwise, the stream may be forwarded to the regular pipeline or to the NIC 200, 200032, 200042 to be packetized.

• Note that it is not necessary to check the forwarding state each time packets are received. It is also possible to cache the decision by comparing the network context to the forwarding state at a single time. This cached decision, however, needs to be re-evaluated whenever the socket's network context information or the forwarding state of the vSwitch 2000 changes.

The present disclosure provides one or more of the following advantages:

• Reduced CPU, RAM and PCI bandwidth usage. • Reduced latency and jitter in the system.

• Avoiding unnecessary packetization, per-packet forwarding and buffering, and stream reassembly.

• Enabling the use of hardware offload features of the host NICs.

• Forwarding byte streams between workloads and vSwitches without

intermediate packetization. This includes modified libraries that can communicate the socket handler and network context information, and a modified vSwitch that is capable of piping byte streams from sockets to sockets.

It is believed that the advantages of the technique presented herein will be fully understood from the foregoing description, and it will be apparent that various changes may be made in the form, constructions and arrangement of the exemplary aspects thereof without departing from the scope of the disclosure or without sacrificing all of its advantageous effects. Because the technique presented herein can be varied in many ways, it will be recognized that the disclosure should be limited only by the scope of the claims that follow.