Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR OPTIMAL MEMORY MANAGEMENT BETWEEN CPU AND FPGA UNIT
Document Type and Number:
WIPO Patent Application WO/2014/092551
Kind Code:
A1
Abstract:
The present invention provides a system for synchronising memory data transfer. The system comprises a host Central Processing Unit (CPU) having a memory optimizer; and a Field Programmable Gate Array (FPGA) having a memory load identifier. When data is received on the host CPU, the memory optimizer operationally checks memory utilization at an idle state in the FPGA and determines whether to drop/retain data based on the memory utilization on the FPGA; and when data is processed on the FPGA, the memory load identifier operationally checks memory utilization at an idle sate in the host CPU and determines whether to drop/retain data based on the memory utilization on the host CPU. A method of synchronizing memory data is also provided herewith.

Inventors:
MOHD SHAFIQ BIN ALIAS (MY)
KARUPPIAH ETTIKAN KANDASAMY A L (MY)
HOE ONG HONG (MY)
TAHIR SHAHIRINA BINTI MOHD (MY)
Application Number:
PCT/MY2013/000265
Publication Date:
June 19, 2014
Filing Date:
December 13, 2013
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
MIMOS BERHAD (MY)
International Classes:
G06F13/28
Domestic Patent References:
WO2003050692A12003-06-19
WO2012120690A12012-09-13
Attorney, Agent or Firm:
YAP, Kah, Hong (Suite 8-02 8th Floor,Plaza First Nationawide 16, Jalan Tun H.S. Lee Kuala Lumpur, MY)
Download PDF:
Claims:
Claims

1. A system for synchronising memory data transfer, the system comprising: (a) a host Central Processing Unit (CPU) having a memory optimizer; and a Field Programmable Gate Array (FPGA) having a memory load identifier; wherein when data is received on the host CPU, the memory optimizer operationally checks memory utilization at an idle state in the FPGA and determines whether to drop/retain data based on the memory utilization on the FPGA;

when data is processed on the FPGA, the memory load identifier operationally checks memory utilization at an idle sate in the host CPU and determines whether to drop/retain data based on the memory utilization on the host CPU.

2. The system of claim 1, wherein the CPU memory optimizer comprises a dynamic size memory and the FPGA comprises a fixed size memory. 3. The system of claim 1, wherein the FPGA comprises a Direct Memory Access (DMA) controller, an external memory.

4. The system of claim 1, wherein the memory utilization in the FPGA and the memory utilization in the host CPU are received from the host CPU and the FPGA respectively through a feedback signal therefrom.

5. A method for synchronizing memory during data transfer between a host CPU and a Field Programmable Gate Array (FPGA), the method comprising:

mapping the data from a memory of the host CPU to one or more memories associated with the FPGA;

self-checking memory utilization at an idle state in the FPGA;

sending a first feedback regarding the memory utilization information to the host

CPU;

determining whether data drop/retain is required on the host CPU based on the first feedback;

mapping data from a memory associated with FPGA to one or more memories in host CPU; self-checking memory utilization at an idle state in the host CPU;

sending a second feedback regarding the memory utilization information to the FPGA; host CPU; and

determining whether data drop/retain is require on the FPGA based on the second feedback.

6. The method as claimed in claim 5, wherein the memory mapping on the host CPU comprising:

checking a memory threshold on the host CPU upon receiving input data;

tagging the data with label information on a virtual memory of the host CPU; controlling the flow of the input data based on the first feedback at the beginning, during or/and ending of the data transfer from host CPU to FPGA;

retaining data by continuing the mapping transfer execution as the FPGA is demanding more data;

dropping data at the beginning of packet transmission by slowing down the transmission rate at host CPU;

monitoring and checking the FPGA memory status flags of being either almost full or almost empty or full or empty, memory rate/speed and error signal during the transfer or/and before sending back data to host CPU memory optimizer; FPGA is sending data to endpoint of FPGA memory.

7. The method of claim 5, wherein the memory mapping on FPGA comprising: checking continuously memory status flag of the host CPU through the first feedback;

tagging the output packet in the DMA controller register, wherein data is tagged with ID parameters including Input/Output, memory flag, memory rate, rrror signal and camera ID;

controlling the flow of data on the FPGA based on the the host CPU memory threshold, memory rate/speed and error signal at the beginning, during or/and ending of the data transfer from FPGA;

retaining data by continuing the mapping transfer as the host CPU is demanding more data; dropping frame at the beginning of packet transmission by slowing down the transmission rate at the FPGA on the DMA control register.

8. The method of claim 5, further comprising updating synchronization status of host CPU and the FPGA.

9. The method of claim 5, wherein the data comprises sequence of image frame.

10. The system of any one of claims 1 to 4, adapted for carrying out the method according to any one of claims 5 to 9.

Description:
System and Method for Optimal Memory Management Between CPU and FPGA Unit

Field of the Invention

[0001] The present invention relates to memory management between devices. More particularly, the present invention relates to a system and method for managing memory between a host device and FPGA.

Background

[0002] The inefficient memory management and transfer for real-time data processing for large number of AV streams between CPU & FPGA causes un-optimized resource utilization. 2. Solution Utilization of optimal memory management techniques to dynamically move parallel AV data between CPU & FPGA to ensure timeliness AV data transfer and total memory size harmonization between the computing units.

[0003] FIG. 1 illustrates a typical set up of a system that includes a host center processing unit (CPU) 102 and a Field Programmable Gate Array (FPGA) 104. The host CPU 102 is adapted to receive video streams from multiple imaging devices 101. The host CPU has a CPU 111, an internal host memory 112 that holds applications 114 therein and a root complex module 113. The root complex module 113 connects the CPU 111 and the internal host memory 112 to a Peripheral Component Interconnect Express (PCIe) interface. The PCIe interface can be used to connect the host CPU 111 with the FPGA 104. The FPGA 104 has a Direct Memory Allocation (DMA) module 121 to receive transmission from the root complex module 113 through the PCIe interface to allow the data to be stored on the FPGA external memory 150. [0004] Operationally, the host CPU 102 serves as an out memory to the FPGA 104 as in memory when writing data to the external memory 150, and vise versa when it is reading data from the external memory 150.

Summary [0005] In accordance with one aspect of the present invention, there is provided a system for synchronising memory data transfer. The system comprises a host Central Processing Unit (CPU) having a memory optimizer; and a Field Programmable Gate Array (FPGA) having a memory load identifier. When data is received on the host CPU, the memory optimizer operationally checks memory utilization at an idle state in the FPGA and determines whether to drop/retain AV data based on the memory utilization on the FPGA; and when data is processed on the FPGA, the memory load identifier operationally checks memory utilization at an idle sate in the host CPU and determines whether to drop/retain AV data based on the memory utilization on the host CPU. [0006] In one embodiment, the CPU memory optimizer comprises a dynamic size memory and the FPGA memory load identifier comprises a fixed size memory. In another embodiment, the FPGA comprises a Direct Memory Access (DMA) controller, an external memory. Further, the memory utilization in the FPGA and the memory utilization in the host CPU are received from the host CPU and the FPGA respectively through a feedback signal therefrom.

[0007] In another aspect of the present invention, there is also provided a method for synchronizing memory during data transfer between a host CPU and a Field Programmable Gate Array (FPGA). The method comprises mapping the data from a memory of the host CPU to one or more memories associated with the FPGA; self- checking memory utilization at an idle state in the FPGA; sending a first feedback regarding the memory utilization information to the host CPU; determining whether data drop/retain is required on the host CPU based on the first feedback; mapping data from a memory associated with FPGA to one or more memories in host CPU; self- checking memory utilization at an idle state in the host CPU; sending a second feedback regarding the memory utilization information to the FPGA; host CPU; and determining whether data drop/retain is required on the FPGA based on the second feedback.

[0008] In one embodiment, the memory mapping on the host CPU comprises checking a memory threshold on the host CPU upon receiving input data; tagging the data with label information on a virtual memory of the host CPU; controlling the flow of the input data based on the first feedback at the beginning, during or/and ending of the data transfer from host CPU to FPGA; retaining data by continuing the mapping transfer execution as the FPGA is demanding more data; dropping data at the beginning of packet transmission by slowing down the transmission rate at host CPU; monitoring and checking the FPGA memory status flags of being either almost full or almost empty or full or empty, monitoring memory rate/speed and error signal during the transfer or/and before sending back data to host CPU memory optimizer; FPGA is sending data to endpoint of FPGA memory.

[0009] In a further embodiment, the memory mapping on FPGA comprises checking continuously memory status flag of the host CPU through the first feedback; tagging the output packet in the DMA controller register, wherein data is tagged with ID parameters including Input/Output, memory flag, memory rate, error signal and camera ID; controlling the flow of data on the FPGA based on the host CPU memory threshold, memory rate/speed and error signal at the beginning, during or/and ending of the data transfer from FPGA; retaining data by continuing the mapping transfer as the host CPU is demanding more data; dropping frame at the beginning of packet transmission by slowing down the transmission rate at host CPU or/and on the DMA control register in FPGA.

[0010] In yet a further embodiment, the method may comprise updating synchronization status of host CPU and the FPGA. It is possible that the data comprises sequence of image frame.

[0011] In a further aspect, there is also provided an aforesaid system adapted for carrying out the aforesaid method.

Brief Description of the Drawings

[0012] Preferred embodiments according to the present invention will now be described with reference to the figures accompanied herein, in which like reference numerals denote like elements; [0013] FIG. 1 illustrates a typical set up of a system that includes a host center processing unit (CPU) and a Field Programmable Gate Array (FPGA);

[0014] FIG. 2 illustrates a schematic block diagram of a host CPU and a Field

Programmable Gate Array (FPGA) and a memory management thereon in accordance with one embodiment of the present invention; [0015] FIG. 3 illustrates a block diagram of read and write operations of a system having a host CPU and a FPGA in accordance with one embodiment of the present invention;

[0016] FIG. 4 A exemplifies a header of a data packet for sending from the host CPU to the FPGA of FIG. 3 in accordance with one embodiment of the present invention;

[0017] FIG. 4B exemplifies a header of a data packet for sending from the

FPGA to the host CPU of FIG. 3 in accordance with one embodiment of the present invention; [0018] FIG. 5 illustrates a memory mapping of a system having a host and a

FPGA in accordance with one embodiment of the present invention;

[0019] FIG. 6A and 6B illustrate flow charts of data processes in accordance with one embodiment of the present invention;

[0020] FIG. 7 illustrates a state machines diagram of the memory management of the system in accordance with one embodiment of the present invention;

[0021] FIG. 8 A illustrates a process carried out in the state A of the state machine of FIG. 7 in accordance with one embodiment of the present invention;

[0022] FIG. 8B illustrates a process carried out in the state B of the state machine of FIG. 7 in accordance with one embodiment of the present invention; [0023] FIG. 8C illustrates a process carried out in the state C of the state machine of FIG. 7 in accordance with one embodiment of the present invention; and [0024] FIG. 8D illustrates a process carried out in the state D of the state machine of FIG. 7 in accordance with one embodiment of the present invention.

Detailed Description

[0025] Embodiments of the present invention shall now be described in detail, with reference to the attached drawings. It is to be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated device, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.

[0026] FIG. 2 illustrates a schematic block diagram of a host CPU 202 and a

Field Programmable Gate Array (FPGA) 204 and a memory management thereon in accordance with one embodiment of the present invention. The present embodiment is implementable on a system adapted to receive data from an Audio Video (AV) device that operationally captures audio-video data streams. More specifically, the system is adapted to handle data streams transmitted from multiple AV devices, wherein the data streams are transmitted to the host CPU 202 at a dynamic arrival rate. The system adapts a tagging mechanism to the parallel streaming whereby the packet header provides information based on priority and camera number sequence to ensure optimal data transfer.

[0027] As shown in FIG. 2, the host CPU 202 is connected to the FPGA 204 via a PCIe link 206. The host CPU 202 comprises a memory optimizer 221, software applications 222, an application programming interface (API) 223 and an API deriver 224. The CPU memory optimizer 221 is adapted to measure memory threshold on the host CPU 222 and to provide a tagging mechanism in virtual memory to label information on the parallel AV streams as their parameters for optimal data transmissions to FPGA. The information may include camera ID, number of images, priority, frame size, sequence number and etc. [0028] Still referring to FIG. 2, the FPGA 204 comprises a PCIe intellectual property (IP) core 221 and a dynamic memory allocation (DMA) module 222. The PCIe IP core 221 receives/transmit data through a transaction, data link and physical layers. The PCIe IP core 221 also facilitates a virtual memory for managing memory allocations. Specifically, the DMA module 222 comprises a DMA controller 223 and a memory load identifier 225. The memory load identifier 225 in FPGA 204 operationally measures memory load and provide tagging mechanism in FPGA 204 to label information on the memory parameters, such as memory flags, memory rate, and etc., for optimal data transmission with the host CPU 202. The memory optimizer 221 on the host CPU 202 and the memory load identifier 225 on the FPGA 204 feedback to each other based on tags associated to the data. The memory optimizer 221 and the memory load identifier 225 are capable to drop or retain frames as required before the data is being channeled to the external memory, such as SDRAM, for processing. Motion-JPEG (MJPEG) frames may be skipped at the beginning of packet transfer to ensure harmonization between the host CPU 202 and the FPGA 204. [0029] According to the implementation of the present invention, it is desired that the data frames are queued in the memory optimizer before tagging takes place for completing the packetizing on the host CPU 202. With different incoming frame rate from multiple cameras, the data frames are dynamically queued in the dynamic buffer of the memory optimizer 221 until it is full. Once the buffer is full, the data frames are tagged accordingly and excessive data frames from each camera are dropped. On the other hand, frames are skipped or dropped by the FPGA memory load identifier at the beginning of packet transfer, and/or by slowing down the which takes place at a transaction layer protocol of PCIe before memory transfer to the host CPU 202. [0030] FIG. 3 illustrates a block diagram of read and write operations of a system having a host CPU 302 and a FPGA 304 in accordance with one embodiment of the present invention. The host CPU 302 is connected with the FPGA 304 via a PCIe link 306. The host CPI 302 has application modules 312 and driver 314. The application modules 312 are adapted for handling data, such as acquitting data, processing data, transmitting data to the FPGA 304 and the like. The FPGA comprises a PCIe mega core 322 and a DMA module 324. The PCIe mega core 322 maps the data transmitted from the host CPU 302 via the PCIe link 306. Once the data is received through eh PCIe link 306. The memory load identifier 324 has a decipher engine 326, a DMA controller 328 and a high performance memory controller 330. The decipher engine 326 provides cipher/decipher read/write operations to the data. The ciphered/deciphered data passes through the DMA controller 328. Status flags are being tagged to the data. The data that is tagged will be channeled to the high performance memory controller 330 before transmitting to the external memory 350, such as SDRAM. [0031] FIG. 4A exemplifies a header of a data packet for sending from the host

CPU 302 to the FPGA 304 of FIG. 3 in accordance with one embodiment of the present invention. The header contains an input/output, number of image, frame size, sequence number, priority and a camera ID. [0032] FIG. 4B exemplifies a header of a data packet for sending from the

FPGA 304 to the host CPU 302 of FIG. 3 in accordance with one embodiment of the present invention. The header contains an input/output, a memory flag, a memory rate, an error signal, and a camera ID. [0033] FIG. 5 illustrates a memory mapping of a system having a host and a

FPGA in accordance with one embodiment of the present invention. Data acquired by the host will be processed through a virtual memory 502, a physical memory 504, a PCIe BAR 506 and FPGA external memory 508. The virtual memory 502 and the physical memory 504 are resided at the host. When the frame data is received at the host which are at a dynamic memory size, a memory optimizer determines the availability of the memory. Once the availability of the memory is determined, the frame drop or retain is carried out. When the drop frame is required, a drop frame collector calls a function to dispose the image frames to be dropped before mapping back from memory frame buffer of the memory optimizer to the physical memory for transferring. For the frame data that is being retained, it is being mapped to the physical memory and reside at a manage heap of the physical memory 504. The frame data is further mapped to the PCIe BAR 506. Thereafter, the frame data is processing through DMA controller before channel to FPGA external memory 508.

[0034] Once the frame data is being processed by the FPGA external memory 508, the DMA controller determines frame drop/retain at the beginning of the transmission. The data is mapped to the PCIe BAR 506 for transmitting to the physical memory of the host. Finally, the frame data is being processed by the application modules in the virtual memory, when the memory threshold allows. [0035] FIGs. 6A and 6B illustrate flow charts of data processes in accordance with one embodiment of the present invention. The data processes include an in- memory data process (as in FIG. 6A) and an out-memory data process (as in FIG. 6B) with respect to the host CPU. Both in-memory data process and out-memory data process shows data processing between application software 602 and an FPGA hardware 604 connected through a PCIe link 606. The application software 602 is resided at the host CPU.

[0036] As shown in FIG. 6A, the data process starts with capturing audio-video

(AV) data at step 611 and streaming it into the host CPU at step 612. The AV data includes sequences of image frams that forms the video data. The image frames are wrapped at step 613. The image frames are being tagged by the memory optimizer at step 614. The data are then mapped from the user space to physical memory of the host CPU at step 616. The data is being sent to the FPGA hardware 604 through the PCIe link 606 where the data are mapped from the physically memory to PCIe Base Address Register (BAR) at step 620, and subsequently channeled to the DMA controller at step 621. At step 625, the FPGA memory load identifier measures memory loads and tags memory parameters into the header of each data package. Measuring of the memory loads may include measuring memory rate at step 631, measuring error signal at step 632 and checking memory status flags at step 633. Once the data are being tagged with the memory parameters, the data is mapped accordingly at step 626 for channeling to the external memory of the FPGA at step 627.

[0037] Returning to the step 625, once the memory load is being determined and data are being tagged, the memory load identifier transmits a feedback signal to the memory optimizer at step 628 through the PCIe link 606. At step 618, the memory optimizer 615 receives the feedback signal from the FPGA hardware 604. The memory optimizer 615 processes the images to decide the image frames to be dropped or retained at step 617.

[0038] Referring now to FIG. 6B. The AV data is taken from the external memory in step 651 and send to the DMA controller in step 652. Memory allocation done by the DMA controller 652 is affected by the memory load identifier in step 655. The memory load identifier determines the frames to be dropped/retained in step 657 based on the frame tagging done on the memory load identifier in step 656. The AV data from the external memory of the FPGA is being mapped to the PCIe BAR for transmitting to the host CUP in step 254. Once transmitted, the data is further mapped from PCIe BAR to physically memory of the host CPU in step 261. The data is further mapped from the physical memory into user space in step 662. The data mapped in the user space is channeled to the memory optimizer at 665. The memory optimizer measures the memory threshold of the host CPU in step 666, checks the data arrival rate in step 667 and checks the status flag in step 668. The image frames is being unwrapped from the data in step 663 for streaming in step 664.

[0039] Operationally, the memory optimizer sends feedback signals to the

FPGA hardware 604 in step 669. Once the feedback signal is received in step 658, the feedback signal is fed to the memory load identifier.

[0040] FIG. 7 illustrates a state machines diagram of the memory management of the system in accordance with one embodiment of the present invention. The state machine is carried out in four states: state A, state B, state C and state D. The state A and state C are carried out on the host CPU side whilst state B and state D are carried out on the FPGA.

[0041] The conditions to trigger state A and state D at the include memory threshold (either full or empty), arrival rate/speed, sequence number. One the other hand, the conditions to trigger state B and state C on the FPGA include memory flags (almost full/almost empty/full/empty), memory rate and speed, error signal and etc.

[0042] FIG. 8A illustrates a process carried out in the state A of the state machine of FIG. 7 in accordance with one embodiment of the present invention. The host CPU receives AV streaming from the camera in step 801. The user space memory block is the locked at virtual memory for memory mapping in step 802. The image frames is then wrapped with ID parameters of the image frames in step 803. the memory is further mapped from virtual memory to physical memory in step 804. In step 806, the virtual memory reads a feedback from FPGA at the state B in step 805. The feedback contains the memory state of the FPGA. In step 807, the memory state of the FPGA is being checked if it is sufficient. If the memory is sufficient, all the frames are retained in step 808. The frames are then tagged based on the feedback signal received from state B in step 809. If the memory is determined to be insufficient in the step 807, image frames are dropped at the beginning of the transmission in step 810. A feedback signal regarding the dropped frames is sent to the state D in step 811.

[0043] FIG. 8B illustrates a process carried out in the state B of the state machine of FIG. 7 in accordance with one embodiment of the present invention. In step 821, data received from the previous state, i.e. state A, is being mapped from the physical memory to PCIe BAR then to the DMA. The memory load identifier of the FPGA updates the status flags, image size, and etc. on the DMA controller in step 822. The memory load identifier further determines if the FPGA's external memory is sufficient in step 823. If the external memory is sufficient, memory mapping from PCIe BAR to the external memory is carried out in step 824, and the data is sent to the FPGA external memory in step 825. Then a trigger is sent to activate state C in step 826. Returning to the step 823, if the external memory of the FPGA is determined to be insufficient, a feedback signal is sent to the host CPU in step 827 during first interrupt to the base address of stolen region in virtual memory. A trigger is sent in step 828 to the state A. Besides looping back to the state A, the process reverts to the step 821 for self-checking on the out-coming data rate from FPGA for balancing memory load between the host CPU and the FPGA.

[0044] FIG. 8C illustrates a process carried out in the state C of the state machine of FIG. 7 in accordance with one embodiment of the present invention. In step 841, the FPGA completes the video processing. In step 842, the memory load identifier checks ID parameters from DMA memory controller. The memory load identifier checks if the CPU memory is sufficient in step 843. Information regarding the CPU memory is contained in a feedback signal from the memory optimizer of the host CPU at step 850. The feedback signal is sent from the memory optimizer to the DMA controller. The DMA controller reads the feedback signal upon receiving a trigger from state D in step 851.

[0045] Returning to the step 843, if the host CPU memory is sufficient, the FPGA retains frame in step 844. Accordingly, the status flags, image size and etc. are updated on the DMA controller in step 845. The data is then mapped from PCIe BAR to physical memory for transmitting to the host CPU in step 846. If the host CPU memory is determined to be insufficient. The images frames are dropped at the beginning of transmission in step 847. The memory load identifier sends trigger to state C. The trigger is sent to state C as a propagation signal to inform the FPGA's load identifier to update the status of the FPGA on the host CPU so that FPGA's memory load identifier can either supply more frames or limit the total frames to be sent over to host CPU in step 848. In step 849, the memory needed is being checked in state C.

[0046] FIG. 8D illustrates a process carried out in the state D of the state machine of FIG. 7 in accordance with one embodiment of the present invention. In step 861, the FPGA sends back the data by writing it to the host CPU. In step 862, the memory optimizer receives a function call to measure memory threshold at the virtual memory. Such function call is initiated through a trigger received from the state C in step 863. Accordingly, the CUP memory is checked whether it is sufficient at step 864. If the CPU memory is sufficient, memory mapping is carried our between the CPU and the FPGA in step 865. The data is then received from the FPGA in step 866 and in turn send to the application of the host CPU for processing in step 867. Returning to the step 864, if the CPU memory is not sufficient, a feedback is sent to FPGA during first interrupt to the base address of stolen region in the virtual memory in step 868. The camera's ID parameter is updated whiles receiving the image frames from FPGA in step 869. In step 870, FPGA drop frames at the base address level at the FPGA load identifier. In step 871, a trigger is sent to activate state A.

[0047] While specific embodiments have been described and illustrated, it is understood that many changes, modifications, variations, and combinations thereof could be made to the present invention without departing from the scope of the invention.