Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
METHOD AND APPARATUS FOR COMPARING AND RANKING LONG DOCUMENTS
Document Type and Number:
WIPO Patent Application WO/2022/170092
Kind Code:
A1
Abstract:
A method includes dividing a first document and a second document into sections, the first and second documents comprising a sequence of tokens. Each of the sections is split into chunks and a positional embedding is added to each chunk. The chunks are input in a pre-trained BERT system to obtain a chunk embedding per chunk. For each of the first document and the second document, an attention vector between chunk embeddings of different sections within their respective document is computed and integrated in an embedding of the corresponding section. An agreement is maximized between the embeddings of the sections within their respective document. An agreement is minimized between the embeddings of the sections with respect to different sections across the first document and the second document.

Inventors:
MOHAN VINEETH RAKESH (US)
JHA AKSHITA (US)
CHANDRASHEKAR JAIDEEP (US)
Application Number:
PCT/US2022/015305
Publication Date:
August 11, 2022
Filing Date:
February 04, 2022
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
INTERDIGITAL PATENT HOLDINGS INC (US)
International Classes:
G06F40/30; G06F16/33; G06N3/02
Other References:
LIU YANG ET AL: "Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 13 October 2020 (2020-10-13), XP081916195, DOI: 10.1145/3340531.3411908
AKSHITA JHA ET AL: "Supervised Contrastive Learning for Interpretable Long Document Comparison", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 20 August 2021 (2021-08-20), XP091036327
Attorney, Agent or Firm:
SPICER, Andrew W. (US)
Download PDF:
Claims:
CLAIMS

1. A method comprising: dividing a first document and a second document into sections, the first and second documents comprising a sequence of tokens; splitting each of the sections into chunks and adding a positional embedding to each chunk; inputting the chunks in a pre-trained BERT system to obtain a chunk embedding per chunk; for each of the first document and the second document, computing an attention vector between chunk embeddings of different sections within their respective document and integrating the attention vector in an embedding of the corresponding section; maximizing an agreement between the embeddings of the sections within their respective document; and minimizing an agreement between the embeddings of the sections with respect to different sections across the first document and the second document.

2. The method of claim 1, wherein the first document and the second document belong to a same category of documents.

3. The method of claim 1 or 2, wherein the positional embedding is an index of a token of a chunk in the corresponding document.

4. The method of claim 1 or 2, wherein the positional embedding is a unique value for the section corresponding to the chunk.

5. The method of claim 1 or 2, wherein the positional embedding is a unique identifier for the chunk.

6. The method of claims 1 to 5, wherein chunks with less than 512 tokens are padded with zeros.

7. The method of claims 1 to 6, wherein computing the attention vector comprises: using a first chink embedding of a first section of the respective document as a query; using a second chunk embedding of a second section of the respective document, different from the first section, as a key; computing a vector from the second chunk embedding; and outputting the attention vector as a function of the query, the key and the vector.

8. A device comprising a memory associated with a processor configured to, for a first and a second documents, a document being a sequence of tokens: divide the current document in sections; split each section in chunks and add a positional embedding to each chunk; input the chunks in a pre-trained BERT system to obtain a chunk embedding per chunk; compute an attention vector between chunk embeddings of different sections of a same document and integrate the attention vector in an embedding of the sections; maximize an agreement between the embeddings of the sections of a same document; and minimize an agreement between the embeddings of the sections of a different sections of the first and the second document.

9. The device of claim 8, wherein the first and the second document belong to a same category of documents.

10. The device of claim 9 or 10, wherein a positional embedding is an index of a token of a chunk in the document.

11. The device of claim 9 or 10, wherein a positional embedding is a unique value for the section of the chunk.

12. The device of claim 9 or 10, wherein a positional embedding is a unique identifier for the chunk.

13. The device of claims 9 to 12, wherein chunks with less than 512 tokens are padded with zeros.

14. The device of claims 9 to 13, wherein the processor is configured to compute the attention vector by: using a first chink embedding of a first section of the document as a query; using a second chunk embedding of a second section of the document, different from the first section, as a key; computing a vector from the second chunk embedding and; outputting the attention vector as a function of the query, the key and the vector.

17

Description:
METHOD AND APPARATUS FOR COMPARING AND RANKING LONG DOCUMENTS

Cross-Reference to Related Applications

This application claims the benefit of U.S. Patent Application No. 63/146,018, filed February 5, 2021, which is incorporated herein by reference in its entirety.

Technical Field

The present principles generally relate to the domain of semantic text matching and in particular to semantic text matching that can handle long lists of tokens, that can use the interrelations between sections of documents and that can output interpretable comparisons.

Background

The present section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present principles that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present principles. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.

Semantic text matching (STM) is an important problem in the field of information retrieval. Given a query, which could range from a few lines of text to a few pages in case of a long document, the objective of STM is to retrieve a set of documents related to the query. While recent advancements in language modeling have certainly produced promising results, their effectiveness has been confined to matching short-queries to short-documents or long-documents. Semantic text matching of long-documents to long-documents is a more challenging problem. Important use cases for long-document matching comprise, for example, ranking of research papers, comparison of patent documents or clustering of Wikipedia articles spanning several pages. The representation of words in or as a high dimensional vector in which each entry is a measure of the syntactic, semantic and contextual association between words. These vectors could either be sparse or dense. Modem neural networks represent words as dense vectors that are derived by various training methods inspired from neural -network language modeling. These representations, referred to as “neural” or “word” embeddings. An embedding is a vector representative of an element of the present system. The element can be a token (for instance a word or a part of a word) or a chunk (i.e., a sequence of tokens) or a section (i.e., a sequence of chunks) or the document itself. Documents are sequences of tokens, a token being a numerical identifier. Thus, a document is an electronic data sequence.

Developing an STM model for long-documents involves difficult challenges Firstly, it is not possible, from a stochastic approach, it is not possible to Team’ the semantics of words, phrases or sentences by considering only a few lines of text (or paragraphs), as long-documents span several pages. Indeed, learning a representative embedding that encapsulates the context of an entire document is extremely challenging. For example, Transformer models, such as well-known BERT, cannot handle more than 512 tokens (or words) during a single feed-forward pass. A second challenging task is to take into account various levels of (dis)similarity between different sections of text. Another challenge is interpretability of the output. When comparing documents, it is often not sufficient to just provide a score that describes the amount of similarity. It is also important to explain what makes the documents (dis)similar.

There is a lack for a solution for semantic text matching that can handle long lists of tokens, that can use the inter-relations between sections of documents and that can output interpretable comparisons.

4. Brief Description of Drawings

The present disclosure will be better understood, and other specific features and advantages will emerge upon reading the following description, the description making reference to the annexed drawings wherein: Fig. 1 presents a simple example of semantic text matching of long-documents to long- documents;

Fig. 2 diagrammatically illustrates a method and architecture of a Contrastive Learning Framework according to the present principles;

Fig- 3 shows an example architecture of a device which may be configured to implement a method described in relation with Fig. 2; and

Fig- 4 illustrates the different kinds of positional embeddings.

Summary

The following presents a simplified summary of the present principles to provide a basic understanding of some aspects of the present principles. This summary is not an extensive overview of the present principles. It is not intended to identify key or critical elements of the present principles. The following summary merely presents some aspects of the present principles in a simplified form as a prelude to the more detailed description provided below.

In an embodiment, a method includes dividing a first document and a second document into sections, the first and second documents comprising a sequence of tokens. Each of the sections is split into chunks and a positional embedding is added to each chunk. The chunks are input in a pre-trained BERT system to obtain a chunk embedding per chunk. For each of the first document and the second document, an attention vector is computed between chunk embeddings of different sections within their respective document and the attention vector is integrated in an embedding of the corresponding section. An agreement is maximized between the embeddings of the sections within their respective document. An agreement is minimized between the embeddings of the sections with respect to different sections across the first document and the second document.

The present principles also relate to a device comprising a processor associated with a memory, the processor being configured to execute the method above. Detailed description of embodiments

The present principles will be described more fully hereinafter with reference to the accompanying figures, in which examples of the present principles are shown. The present principles may, however, be embodied in many alternate forms and should not be construed as limited to the examples set forth herein. Accordingly, while the present principles are susceptible to various modifications and alternative forms, specific examples thereof are shown by way of examples in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the present principles to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present principles as defined by the claims.

The terminology used herein is for the purpose of describing particular examples only and is not intended to be limiting of the present principles. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises", "comprising," "includes" and/or "including" when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Moreover, when an element is referred to as being "responsive" or "connected" to another element, it can be directly responsive or connected to the other element, or intervening elements may be present. In contrast, when an element is referred to as being "directly responsive" or "directly connected" to other element, there are no intervening elements present. As used herein the term "and/or" includes any and all combinations of one or more of the associated listed items and may be abbreviated as"/".

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element without departing from the teachings of the present principles. Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.

Some examples are described with regard to block diagrams and operational flowcharts in which each block represents a circuit element, module, or portion of code which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in other implementations, the function(s) noted in the blocks may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.

Reference herein to “in accordance with an example” or “in an example” means that a particular feature, structure, or characteristic described in connection with the example can be included in at least one implementation of the present principles. The appearances of the phrase in accordance with an example” or “in an example” in various places in the specification are not necessarily all referring to the same example, nor are separate or alternative examples necessarily mutually exclusive of other examples.

Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims. While not explicitly described, the present examples and variants may be employed in any combination or sub-combination.

Fig. 1 presents a simple example of semantic text matching within, between, or among various documents. In the example of Fig. 1, three documents are presented. The Query document

10 is compared against Target document 11 and Target document 12. In the example of Fig. 1, the documents homogeneously refer to compression (although the context varies). The documents, Query 10 and Target 12 talk about neural network compression, whereas Target 11 talks about image compression. In the three presented documents 10, 11 and 12, three levels of similarity are presented: (a) similar boiler plate text 13 does not contribute to the distinguishing content of the documents; (b) semantically similar words 14: Query 10 has a ‘semantic match’ with both Target

11 and Target 12 (‘reducing the size’ semantically matches ‘compress’ and ‘compression’); and (c) contextually similar content - Query 10 is contextually similar 15 to Target 12 - they discuss the same concepts of ‘compressing neural networks’, unlike Target 11 that talks about ‘image compression’. In a long-form document, such contextual information often cannot be readily learned within just a few lines (or paragraphs) as the distinguishing information might have been presented at different places in each document.

According to the present principles, sections within the same document are considered to be closer to each other and sections across different documents to be farther apart in the latent space, while learning document representations. Additionally, sections of documents belonging to the same class are closer to each other in the latent space than sections of documents belonging to a different class. According to the present principles, three levels of similarity scores are provided: (i) similarity score between documents; (ii) similarity score between sections within and across different documents; and (iii) similarity scores between different text chunks in a section within and across documents. Similarity scoring may be performed by using a combination of contrastive loss and a multi-headed attention layer for different text chunks within a section as described in relation to Fig. 2.

Fig. 2 diagrammatically illustrates a method and architecture 20 of a Contrastive Learning Framework according to the present principles. Two embodiments of this framework may be developed: (i) Self-Supervised Contrastive Learning Framework (hereafter VI), and (ii) Supervised Contrastive Learning Framework (hereafter V2). The goal of VI is to maximize the similarity between sections within the same document by using a self-supervised loss. The goal of V2 is to maximize the similarity between different sections of having the same label across different documents. Both these models have a similar method/framework and include three primary steps/components as discussed with reference to Fig. 2: (i) Data Augmentation 21; (ii) Data Encoder 22; and (iii) Contrastive Loss Function 23 (either Self-Supervised Contrastive or Supervised Contrastive Loss depending upon the model).

The data augmentation step/component 21 includes an automatic data augmentation of the Query and Target documents. Due to the size of the documents and the number of documents, this step may be performed by a device. Indeed, such a step, to be efficient would require a significant effort from a group of human operators. Long documents 210, unlike short texts, follow an inherent structure and are organized in a way that demonstrates the logical development of ideas. A long document presents a hierarchical relationship of the information being conveyed, by representing different chunks of information by headings, and sub-headings, resulting in their corresponding section, and sub-sections. These sections and sub-sections 211, 212 may be considered as different ‘views’ of the same document, where each view represents partial information within the document. As shown in Fig. 2, the data augmentation step/component 21 may be applied to multiple long-documents 210.

The task of learning meaningful representations for the long-documents 210 can be broken down into learning better representations for the sub-sections 211, 212 of the long documents 210. In the self-supervised model VI, representations of sections within the same document are semantically closer in the latent space than sections of a different document. Furthermore, in supervised model V2, representations of sections of documents belonging to the same class are closer than the representations of sections from a document of a different class.

Data Augmentation step/component 21 divides long documents 210 into different sections 211 and 212 (for instance, on the basis of recognition of titles and paragraphs or on the basis of a document structure like HTML or XML; lexical and syntax analysis methods may also be used for achieving step 21). Data augmentation step/component 21 takes a long document 210 as input and extracts different sections from the document. Each of these sections contain a subset of the text present in the original long document, and thus represents a subset of the information present in the document. These sections are then fed to a Data Encoder step/component 22 for further processing.

Data Encoder step/component 22 uses a particular configuration of well-known BERT system or model, according to a non-limiting embodiment of the present principles. BERT is generally well-known in the art and a detailed description thereof is omitted here for convenience only and should not be considered limiting. One of the major limitations of BERT is that it can only handle 512 tokens at a time making it ineffective for encoding long documents. According to the present principles, multiple chunks of 512 tokens are used. Additional positional embeddings are added to encode a long document structure. Furthermore, an additional step/component 220 (hereafter “multi-head chunkwise attention”) is used before computing the contrastive loss at step 23 between and across different sections of documents. From step/component 21, unique embeddings 221 are obtained. They comprise token embeddings combined with additional long-document structure aware positional embeddings before being used as input to the BERT model.

Fig- 4 illustrates the different kinds of positional embeddings 221. They comprise:

• Token Embeddings: These embeddings are the index of the token in the input sequence and may be the same as used in connection with BERT.

• Section Embeddings: Section embeddings are unique values given to each section by step/component 21.

• Chunk Embeddings: Component 22 is based on BERT which can only handle 512 tokens at a time. Each section is further divided into several chunks of 512 tokens. Each of these chunks is given a unique ID. These embeddings are summed together and form the final input embeddings that are provided to distinct BERT instances for encoding.

In the example of Fig. 2, the encoder is, for instance, built on BERTBASE which has 12 Transformer blocks, a hidden size of 768 and 12 attention blocks. There are a 110M total parameters. A pre-trained BERT model may be used in such an architecture. BERT is limited by the number of tokens it can handle at once. To overcome this issue, different chunks of 512 tokens are fed to several different BERT models at the same time. Zero-padding is done for chunks smaller than 512 tokens. For example, a document section which is divided into 2500 tokens using a case-preserving WordPiece model provided by BERT, is divided into 4 chunks of 512 tokens and a fifth chunk of 452 tokens which is zero-padded to 512 tokens. These chunks are then enhanced with their corresponding unique embeddings as described above and given to a pretrained BERT model for processing. The output from BERT is then used to compute the multihead chunkwise attention 220 between different sections.

Multi-head chunkwise attention step/component 220, which is a self-attention layer across chunks, provides an added level of interpretability to the model. A section is divided into multiple chunks. The goal of multi-head chunkwise attention step/component 220 is to compute attention between different chunks of different sections. Thus, a fine-grained understanding of which chunk in one section is most similar to another chunk in a different section based on what they pertain to may be obtained. A chunk with the highest attention weight with respect to a particular chunk plays the most important part in computing the similarity score.

Multi-head chunk wise attention step/component 220 is performed between sections by treating each chunk in a given section as a Query (Q). The chunks of a second section are treated as Keys (K) against which the attention is computed. The chunks of section 2 are also used in calculating the Values (7). These values have a dimension of dk. The weights for calculating the query Q, keys K, and values V are learned during training. The matrix of the outputs is computed according to equation Eql : where function softmax is the well-known softmax function. According to the present principles, a scaled-dot product attention function with four heads is used as the attention function.

The encoded BERT output for different chunks along with the multi-head chunkwise attention for each section is fed sequentially into a Bidirectional LSTM 222 to get representations of dimension 768 for each section. This section representation is further passed through a projection layer 223 that reduces the dimension of the section representations to 256. These reduced section representations are then used to compute the contrastive loss between and across sections which is then backpropagated through the network during training. The BERT model is fine-tuned during the training process.

In a step 23, a set of N long-documents and its corresponding labels {xk,yk } for k=l,..,N is randomly sampled. This results in S.N data points when each document is divided into S sections. Herein, the set of N documents is referred as a ‘batch’ and the set of S.N documents as an ‘augmented batch’. The supervised contrastive loss function of step/component 23 is defined, for instance, by equations Eq2 and Eq3.

Eq2 £ sup = S t=i ^ UP where N is the batch size; zi = Proj(f(xi) where xl is a section of the document; Proj(.) is the projection layer; and (.) is the encoder. The three indicator functions: (i) e {0, 1 } evaluate to 1 if and only if i A j ; {0, 1 } evaluate to 1 when the labels for two sections are the same; and (iii) e {0, 1 } evaluate to 1 if and only if i A k. The symbol • indicates inner (dot) product; T denotes a temperature parameter. The final loss is summed across all samples. The triplet loss, one of the widely used losses for supervised training, is a special case of the contrastive loss when the batch size N = 2 and it contains only one positive and one negative sample.

The numerator incorporates all positive sections in an augmented batch, i.e., every section having the same label in the augmented batch is treated as a positive sample. The denominator, on the other hand, performs a summation over the negative samples. In a scenario where N = 3, the batch has three documents: DI, D2, and D3, with labels 1, 1 and 0. Each document is split into S=2 sections which results in an augmented batch of SN = 6. If document D 1 is treated as an anchor, all sections of DI and D2 are considered positive samples for each other, whereas sections from D3 are considered negative samples. The loss encourages the encoder to output representations for DI and D2 that are closer to each other in the latent space. Although, this example is directed to binary class, the loss generalizes to a multi-class setting as well. In a multiclass setting, for each document anchor a, its corresponding positive and negative samples are computed depending on the label for a.

In a batch size of N documents, all documents having the same label are treated as positive samples and every other document in the batch is considered negative for these documents. The negative samples may belong to one of the three categories: (i) hard negatives; (ii) semi-hard negatives; and (iii) easy negatives. These three different categories of negatives represent the distance between the representations of negative samples and the representation of the anchor document in the latent space, ranging from very close for hard negative to very far for easy negatives. Most of the times, we explicitly feed in these different kinds of negatives to the model.

Using the supervised contrastive loss hard negative mining is implicitly performed by the model and demonstrates that the gradient computation for hard negatives are large, whereas those for easy negatives are small. The discriminatory power of the encoder increases with the number of negative samples for an anchor document. A straightforward way to increase the number of all kinds of negative samples is to increase the batch size N. The model performance is shown to improve with increase in the batch size.

Fig- 3 shows an example architecture of a device 30 which may be configured to implement a method described in relation with Fig. 2. Alternatively, each component of Fig. 2 may be a device, linked together, for instance, via a bus 31 and/or via VO interface 36.

Device 30 comprises following elements that are linked together by a data and address bus 31 :

• a microprocessor (or CPU) 32, which is, for example, a Digital Signal Processor (DSP);

• a Read Only Memory (ROM) 33;

• a Random Access Memory (RAM) 34;

• a storage interface 35;

• an I/O interface 36 for transmission of data to/from an application; and

• a power supply, e.g., a battery (not shown in Fig. 3).

In accordance with an example, the power supply is external to the device. In each of mentioned memory, the word « register » used in the specification may correspond to area of small capacity (some bits) or to very large area (e.g., a whole program or large amount of received or decoded data). The ROM 33 comprises at least a program and parameters. The ROM 33 may store algorithms and instructions to perform techniques in accordance with present principles. When switched on, the CPU 32 uploads the program in the RAM 34 and executes the corresponding instructions.

The RAM 34 comprises, in a register, the program executed by the CPU 32 and uploaded after switch-on of the device 30, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.

In accordance with examples, the device 30 is configured to implement a method described in relation with Fig. 2, and belongs, for example, to a set comprising but not limited to: a mobile device; a communication device; • a game device;

• a tablet (or tablet computer);

• a laptop;

• a still picture camera rig;

• a video camera;

• an encoding chip; and

• a server (e.g. a broadcast server, a video-on-demand server or a web server).

The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a computer program product, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, Smartphones, tablets, computers, mobile phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.

Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding, data decoding, view generation, texture processing, and other processing of images and related texture information and/or depth information. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.

Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.

As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.