Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR ANALYZING CROWDFUNDING PLATFORMS
Document Type and Number:
WIPO Patent Application WO/2019/069138
Kind Code:
A1
Abstract:
Systems and methods are provided for analyzing crowdfunding platforms. The method includes connecting, using an electronic device, to a plurality of individual lending platforms, and retrieving loan book data from each of the individual lending platforms, storing the loan book data, using a memory coupled to the electronic device, wherein the loan book data includes metadata generated in a Structured Query Language database, and wherein the metadata includes a name of a platform associated with the loan book data and a list of data attributes. The method further includes transforming, using a processor coupled to the electronic device, the loan book data from each of the platforms such that the transformed loan book data uses common data, reading, using the processor, the transformed loan book data, and documenting, for each pair of platform and attribute, a destination unified data attribute.

Inventors:
WALES KIM (US)
BUTY JULIEN (FR)
FROST HARALD (DE)
Application Number:
PCT/IB2018/001260
Publication Date:
April 11, 2019
Filing Date:
October 03, 2018
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
CROWDBUREAU (US)
International Classes:
G06F16/00; G06Q40/00
Domestic Patent References:
WO2012097171A22012-07-19
Foreign References:
US20130304724A12013-11-14
US20130185228A12013-07-18
US20150199239A12015-07-16
US20070011175A12007-01-11
Other References:
See also references of EP 3692451A4
Attorney, Agent or Firm:
POSTOLSKI, David (US)
Download PDF:
Claims:
Claims

What is claimed is:

1. A method for analyzing crowdfunding platforms, the method comprising:

connecting, using an electronic device, to a plurality of individual lending platforms; retrieving loan book data from each of the individual lending platforms;

storing the loan book data, using a memory coupled to the electronic device,

wherein the loan book data includes metadata generated in a Structured Query

Language database, and

wherein the metadata includes a name of a platform associated with the loan book data and a list of data attributes;

transforming, using a processor coupled to the electronic device, the loan book data from each of the platforms such that the transformed loan book data uses common data;

reading, using the processor, the transformed loan book data; and

documenting, for each pair of platform and attribute, a destination unified data attribute.

2. The method as recited in claim 1, wherein the metadata further includes a timestamp for when the loan book data has been received.

3. The method as recited in claim 1, wherein the list of attributes is associated with each borrower listing and loan origination associated with the platform.

4. The method as recited in claim 1, wherein the common data is selected from the group consisting of: a common language; a common currency; a common time zone; common units; and common numeric ranges.

5. The method as recited in claim 1, wherein the storing the loan book data further includes storing the loan book data, for each platform, in its natural state, in real time.

6. The method as recited in claim 1, wherein the documenting is performed according to a mapping table.

7. The method as recited in claim 1, further comprising predicting if a loan associated with a platform is likely to be repaid or not.

8. A system for analyzing crowdfunding platforms, the system comprising:

an electronic device configured to:

connect to a plurality of individual lending platforms; and

retrieve loan book data from each of the individual lending platforms;

a memory coupled to the electronic device, the memory configured to store the loan book data,

wherein the loan book data includes metadata generated in a Structured Query Language database, and

wherein the metadata includes a name of a platform associated with the loan book data and a list of data attributes; and a processor, coupled to the electronic device, the processor configured to:

transform the loan book data from each of the platforms such that the transformed loan book data uses common data;

read the transformed loan book data; and

document, for each pair of platform and attribute, a destination unified data attribute.

9. The system as recited in claim 8, wherein the metadata further includes a timestamp for when the loan book data has been received.

10. The system as recited in claim 8, wherein the list of attributes is associated with each borrower listing and loan origination associated with a primary platform listed and identified across other platforms.

11. The system as recited in claim 8, wherein the common data is selected from the group consisting of: a common language; a common currency; a common time zone; common units; and common numeric ranges.

12. The system as recited in claim 8, wherein the memory if further configured to store the loan book data, for each platform, in its natural state, in real time.

13. The system as recited in claim 8, wherein the processor is configured to document according to a mapping table.

14. The system as recited in claim 8, wherein the processor is further configured to predict if a loan associated with a platform is likely to be repaid or not.

15. The system as recited in claim 8, wherein the electronic device is selected from the group consisting of: a desktop computer; a laptop computer; a tablet computer; and a smartphone.

16. The system as recited in claim 8, further comprising a graphical user interface, and wherein the memory is further configured to store a digital application configured to enable a user to access the destination unified data attributes, using the graphical user interface.

Description:
SYSTEM AND METHOD FOR ANALYZING

CROWDFUNDING PLATFORMS

Inventors:

Kim Wales

Julien Buty

Harald Frost

Claim of Priority

This application is a PCT International non-provisional application and claims priority to U.S. Provisional Patent Application No. 62/568,105, filed October 4, 2017 and herein incorporated by reference in its entirety.

Field of the Embodiments

This invention relates to loan analyzing and, in particular, to analyzing data pertaining to peer-to-peer lending and equities crowdfunding platforms.

Background of the Embodiments

From Main Street storefronts to high tech startups, two third of new jobs over recent decades have been created by American small businesses and medium businesses. The ability for individuals to pursue an idea, to start a company, and to grow a business is the foundation of the U.S. economy.

The Obama Administration sought to ensure that the benefits of the United States' continuing economic recovery from the 2008 financial recession, reach all Americans through the Jumpstart Our Business Startups Act, 2012, that allows for securities crowdfunding (equity and debt) online through an intermediary (broker-dealer or registered funding platform). This initiative spurred on another 40 countries to change their securities laws to address this crisis. It is important that consumers and small and medium sized businesses have broad access to safe and affordable credit and equity facilities. Without capital formation, entrepreneurs cannot put innovative ideas into action. Without sufficient funding Americans cannot grow their businesses to create new jobs and opportunities for the next generation.

Since the launch of the first peer-to-peer lending platforms in 2004 by United Kingdom based platform Zopa, followed by Prosper Marketplace in 2007, to Kickstarter in the United States as the first donation and rewards based platform in 2009, crowdfunding has become exceedingly popular. This "democratization of fundraising" allows entrepreneurs and innovators the opportunity to raise vital capital from individuals and institutions around the world, bypassing the traditional methods of fundraising from pre-existing relationships of friends, family and investors. Kickstarter, Indiegogo and GoFundMe are familiar names accounting for billions in rewards and donations. These crowdfunding platforms are only a small portion of a rapidly growing industry globally. If someone were to plan a crowdfunding campaign, that person would likely first turn to one of these platforms.

Staff, alumni and students at universities across the United States in turn are beginning to leverage these new mechanisms to fund tuition, projects and businesses through exclusive crowdfunding platforms hosted by their schools.

Most of the crowdfunding platforms can be assigned to the four crowdfunding categories presented below, even though the business models sometimes differ strongly within these groups below is an overview of each. In the crowd investing category, for example, there are huge differences between the business models dependent on which part of the JOBS Act is being leveraged. Note, that one or more models can be adopted in order to create a "graduation" model that serves as incubators throughout the lifecycle of a project or business.

Crowdfunding Definitions

Crowd Donation: Funding contributions are donations that do not earn any direct measurable compensation or perk. Examples include social, charitable and cultural projects. Crowd donating can also be used to raise funds for political campaigns. For crowd donating to be successful, it is imperative that an emotional bond be established and maintained between the providers of capital and the recipients.

Crowd Rewards: Crowd rewards include creative and cultural projects as well as sport projects. However, commercial projects can also be subsumed under this category. With this type of financing, the contributor receives a perk (e.g., reward) in the form of products, works of art, or services. There are no limits to the creativity of the parties looking for funding.

Crowd Investing (Equity/ Debt): Instead of financing a project, crowd investing focuses on purchasing equity (common shares) or debt (e.g., convertible notes, mini- bonds) in a company. Crowd investing also offers investors with only limited amounts to invest the opportunity to support the growth of startups, and small- and medium-sized businesses or lifestyles. In return, these investors receive shares in the company or interest repayment based on specified terms. In the case of equity investments, these are often silent partnerships where the investors only have no or limited voting rights. 4. Crowd Lending/ Peer-to-Peer Lending: Crowd Lending mainly refers to the financing of companies or private individuals (e.g. life styles, student loans, real estate, cars, etc.) with loans (borrowed capital). In return for their loan, lenders expect a risk- adjusted return on their investment. As products and business models have evolved, the investor base for online marketplace lenders has expanded to institutional investors, hedge fund, and financial institutions.

Dependent on country, securities based crowdfunding encompasses the selling shares (common stock) and all forms of credit including, but not limited to, mini-bonds, peer-to-peer lending, convertible notes, etc.

This next section provides an overview of the primary business models in online peer-to- peer lending as well as the structures used to fund this activity.

Companies in this industry have developed three primary business models: (1) direct lenders that originate loans to hold in their own portfolios, commonly referred to as balance sheet lenders; (2) platform lenders that partner with an issuing depository institution to originate loans that are funded by all types of lenders and then, in some cases, purchase the loans for sale to investors as whole loans or by issuing securities such as member-dependent notes; and (3) the third business model includes the aforementioned and illustrates the transfer rights and obligations in securitization.

Direct lenders that do not rely on depository institutions to originate loans are generally required to obtain licenses from each state in which they lend. Direct lenders that use state lending licenses to originate loans directly are not subject to a federal banking regulator's supervisory authority, except to the extent the lenders may be subject to CFPB supervision. Summary of the Embodiments

According to an aspect of the present invention, a method for analyzing crowdfunding platforms is provided. The method includes connecting, using an electronic device, to a plurality of individual lending platforms, and retrieving loan book data from each of the individual lending platforms, storing the loan book data, using a memory coupled to the electronic device, wherein the loan book data includes metadata generated in a Structured Query Language database, and wherein the metadata includes a name of a platform associated with the loan book data and a list of data attributes. The method further includes transforming, using a processor coupled to the electronic device, the loan book data from each of the platforms such that the transformed loan book data uses common data, reading, using the processor, the transformed loan book data, and documenting, for each pair of platform and attribute, a destination unified data attribute.

It is an object of the present invention to provide the method for analyzing crowdfunding platforms, wherein the metadata further includes a timestamp for when the loan book data has been received.

It is an object of the present invention to provide the method for analyzing crowdfunding platforms, wherein the list of attributes is associated with each borrower listing and loan origination associated with the platform.

It is an object of the present invention to provide the method for analyzing crowdfunding platforms, wherein the common data is selected from the group consisting of: a common language; a common currency; a common time zone; common units; and common numeric ranges. It is an object of the present invention to provide the method for analyzing crowdfunding platforms, wherein the storing the loan book data further includes storing the loan book data, for each platform, in its natural state, in real time.

It is an object of the present invention to provide the method for analyzing crowdfunding platforms, wherein the documenting is performed according to a mapping table.

It is an object of the present invention to provide the method for analyzing crowdfunding platforms, wherein the method further includes predicting if a loan associated with a platform is likely to be repaid or not.

According to another aspect of the present invention, a system for analyzing

crowdfunding platforms is provided. The system includes an electronic device configured to connect to a plurality of individual lending platforms and retrieving loan book data from each of the individual lending platforms, a memory coupled to the electronic device, the memory configured to store the loan book data, wherein the loan book data includes metadata generated in a Structured Query Language database, and wherein the metadata includes a name of a platform associated with the loan book data and a list of data attributes, and a processor, coupled to the electronic device, the processor configured to transform the loan book data from each of the platforms such that the transformed loan book data uses common data, read the transformed loan book data, and document, for each pair of platform and attribute, a destination unified data attribute.

It is an object of the present invention to provide the system for analyzing crowdfunding platforms, wherein the metadata further includes a timestamp for when the loan book data has been received. It is an object of the present invention to provide the system for analyzing crowdfunding platforms, wherein the list of attributes is associated with each borrower listing and loan origination associated with a primary platform listed and identified across other platforms.

It is an object of the present invention to provide the system for analyzing crowdfunding platforms, wherein the common data is selected from the group consisting of: a common language; a common currency; a common time zone; common units; and common numeric ranges.

It is an object of the present invention to provide the system for analyzing crowdfunding platforms, wherein the memory if further configured to store the loan book data, for each platform, in its natural state, in real time.

It is an object of the present invention to provide the system for analyzing crowdfunding platforms, wherein the processor is configured to document according to a mapping table.

It is an object of the present invention to provide the system for analyzing crowdfunding platforms, wherein the processor is further configured to predict if a loan associated with a platform is likely to be repaid or not.

It is an object of the present invention to provide the system for analyzing crowdfunding platforms, wherein the electronic device is selected from the group consisting of: a desktop computer; a laptop computer; a tablet computer; and a smartphone.

It is an object of the present invention to provide the system for analyzing crowdfunding platforms, wherein the system further includes a graphical user interface, and wherein the memory is further configured to store a digital application configured to enable a user to access the destination unified data attributes, using the graphical user interface. Brief Description of the Drawings

FIG. 1 shows a block/flow diagram of a method/system for analyzing crowdfunding platforms is illustratively depicted, according to an embodiment of the present invention.

FIG. 2 shows a screenshot of a login screen for a digital application for analyzing crowdfunding platforms, according to an embodiment of the present invention.

FIG. 3 shows a screenshot of an alert system configuration screen for the digital application for analyzing borrower capital limits and investor investment limits based on regulatory mandates and specific crowdfunding business models across platforms by using an encrypted unique identifier, according to an embodiment of the present invention.

FIG. 4 shows a screenshot for setting up a user account for the digital application for analyzing crowdfunding platforms, according to an embodiment of the present invention.

FIG. 5 shows a screenshot for configuring alerts for the digital application for analyzing crowdfunding platforms, according to an embodiment of the present invention.

FIG. 6 shows a screenshot for configuring alerts for the digital application for analyzing crowdfunding platforms, according to an embodiment of the present invention.

FIG. 7 shows a screenshot of a platform, using the digital application for analyzing crowdfunding platforms, according to an embodiment of the present invention.

FIG. 8 shows a screenshot of alerts for a platform, using the digital application for analyzing crowdfunding platforms, according to an embodiment of the present invention. Description of the Preferred Embodiments

The preferred embodiments of the present invention will now be described with reference to the drawings. Identical elements in the various figures are identified with the same reference numerals.

Reference will now be made in detail to each embodiment of the present invention. Such embodiments are provided by way of explanation of the present invention, which is not intended to be limited thereto. In fact, those of ordinary skill in the art may appreciate upon reading the present specification and viewing the present drawings that various modifications and variations can be made thereto.

Recent legislative changes have made it possible for companies in the United States to raise the capital they need by means of peer-to-peer marketplace lending and securities (equity and debt (e.g., peer-to-peer lending)) crowdfunding. This allows for accredited and non- accredited investors to buy and sell securities in Small Cap Private Companies and No n- Publicly Traded Funds. The present invention describes an integrated approach for addressing the challenges of this market, including the development of ratings and the creation of an online financial technology platform that provides a transparent framework for investors and creates the mechanisms for the market participants to comply with regulations and benchmark their performance. The design of the rating framework starts with data collection, consolidation, and unification of the peer-to-peer market place lending and securities (equity and debt)

crowdfunding market.

According to an embodiment, the present system has two components. The first component is the technology stack. According to an embodiment, three subcomponents make up the technology stack (system) that proceeds to crawl and yields the first subcomponent, which is to collect data; followed by sanitization feature that allows for the second subcomponent, consolidation; and the third subcomponent, unification of loan book data from securities crowdfunding platforms, known as marketplace lenders, peer-to-peer lenders, and crowdfunding platforms (equity and debt). It is noted, however, that the nomenclature often changes based on the origin of country.

The second component involves data collection. According to an embodiment, the crawling of peer-to-peer loan book data is collected in the first layer/component in each country's natural language (e.g., Chinese, Hindi, English and more); computer encoding; and computer format.

Referring now to FIG. 1, a block/flow diagram of a method/system 100 for analyzing crowdfunding platforms is illustratively depicted, in accordance with an embodiment of the present invention.

Globally, over 2500 platforms have started issuing consumer personal loans, small- and medium-sized business loans, real estate loans (commercial and residential), student loans, agriculture/agribusiness loans, solar/renewable energy loans and automobile loans via web enabled lending platforms. Financial loan data are published by each lending platform as each borrower is listed on the platform seeking funding. The marketplace lender/peer-to-peer lender updates and releases its data on a different time intervals through different mediums, in different formats, and across different jurisdictions.

Some platforms provide the data through the WebSocket real-time protocol (essentially pushing new loan data and events to protocol subscribers). Others through a RESTful API where a script could pull new loan data on a pre-defined interval-time series (hourly, 3-hours, daily, monthly, quarterly, etc.). Dependency of output is based on the age of the peer-to-peer lending platforms, business model of the peer-to-peer lending platforms (some simply update their loan listings at the time the borrowers "ask" for a loan amount, update an event when an investor lends money toward the "ask" amount), and when the loan originates (e.g., the loan is fully funded) on a public web page, providing a comma- separated values (CSV) file for download. Other platforms provide for direct application programming interfaces (APIs) for retail and institutional investors and partners.

These peer-to-peer lending platforms offer their data in different formats, including, but not limited to, JSON, line-delimited- JS ON, CSV, TSV, Excel and HTML. Each format is provided with different possible encoding including, but not limited to, UTF-8, BIG5, Latin- 1 and GBK.

Each platform data may be in a different language (Chinese, English, Hindi, French, Spanish, etc.). Any numeric value may be denominated in different units and these units may be in various currencies (e.g., US Dollar, Renminbi, Euro, British Pound Sterling, Rupee; etc.) and have a different numeric range. The numeric range may include salaries (e.g., 0-1 million versus 0-1 thousand).

The problem is derived from this situation when an entity (e.g., automated or human) desires to comprehend this data in a consolidated macro to micro level, across all platforms for the peer-to-peer lending financial industry (e.g., regulators, investors). In this instance,

"comprehend" refers to the generation of statistics and allowing a high degree of

comparativeness qualitative and quantitative across platforms' data. The heretofore described complexities complicate attempts to analyze risk management in crowdfunding platforms. The solution to address this problem is comprised of three-layer components working together. Illustrated in FIG. 1 is the solution for the collection,

consolidation and unification.

According to an embodiment, the data collection component 105 includes a set of custom-made scripts that connect to an individual lending platform and retrieve its loan book data. Each script complies with and follows the peer-to-peer lending platform data release schedule, medium, and format 110. Once the data from each platform is received, they are stored (archived) in their natural state, in real time, along with meta-data generated in a data collection SQL database 115. According to an embodiment, metadata includes: a timestamp for when the data has been received; the name of the platform; and the list of its data attributes for each borrower listing and subsequent loan origination. According to an embodiment, each borrower listing and/or the loan origination are associated with a primary platform listed and identified across other platforms. At this stage, all platform data are saved using the same encoding (e.g., UTF-8), and the same format (e.g., JSON), but each retain its unique and verifiable data attributes keys (e.g., loan interest may be represented as "Loanlnterest" or "loan_itrst"). This archival step of the data collection component 105 allows for auditing of the original data footprint for compliance purposes prior to sanitization.

According to an embodiment, the data consolidation component 120 addresses the need for transforming the data to use a common language, currency, time zone, common units and numeric ranges. The data consolidation component pulls data from the data collection component 105, reads them and, during a data consolidation process 125, applies various transformations, such as the list of the examples below: 1. Data in natural language (e.g., loan type/ usage, interest rate, loan amount, repayment terms, and more.) is captured in the first instance in the native language and archived for audit purpose and then translated 130 into English. The monetary denomination type data such as loan amount, premiums and other data are captured in the native language and remain in native language for research reports and benchmarks as such due to currency fluctuations. Typically, this will not convert 135 into US Dollar, unless required and then both denominations will be presented with date/time stamp for back testing.

2. Time zones are converted 140 to the UTC time zone.

3. Borrowers income information, interest rate and other numeric information will be converted 145 to use a single floating - point format (e.g., "18K" to "18000.00", "10%" to "0.1"). At this stage all data has been converted to a common format, but each platform still remains with its original and unique set of data attribute key.

According to an embodiment, all of this data is pushed and stored into a queue, to be consumed by the last component, the data unification component 150.

According to an embodiment, the data unification component 150 reads data from a queue populated by the data collection component 105. Based on a mapping table, documenting for each different platform data attribute pair 155 (e.g., platform A/attribute Y) its destination unified data attribute, the data unification component 150 populates a central Structured Query Language (SQL) database 160 for all platform/attribute pairs 155. This results in a central database 160 storing different platform data in a new unified format, from which macro-level statistics and comparison analysis can be achieved with accuracy at a less than 1 percent error rate.

Such a solution as that shown in FIG. 1 allows for transparency at the transaction level for loan data in near real-time, and the normalization and standardization of data, thus allowing for the creation of industry-wide comparisons, valuations, pricing activities, and statistics generation across platforms, across jurisdictions, and across regional settings. For further illustration, please refer to Appendix I.

According to an embodiment, the present method/system 100 includes, e.g., comparing the average interest rate of a platform A in jurisdiction Y, to the average interest rate of another platform B in a jurisdiction Z; and averaging all platform loan default rates for an entire jurisdiction or region.

According to an embodiment, the present method/system 100 includes, e.g., the feasibility and value of using social media in traditional company (public and private)-specific ratings models and investor-specific ratings.

According to an embodiment, the present method/system 100 includes, e.g., creating an industry wide standard weighted credit risk model to underwrite loan and track performance.

According to an embodiment, the present method/system 100 includes, e.g., the ability to identify when a borrower exceeds borrower limits on one or more platforms.

According to an embodiment, the present method/system 100 collects, consolidates, and unifies data from a plurality of separate peer-to-peer lending platforms, such as, e.g., those from China, USA, and Europe, covering consumer loans, real estate, student loans, automobiles, agribusiness, renewable energy/solar, and lifestyles, among others. According to an embodiment, the present invention provides the following:

1. Stable automation of an API and/or web-scraping technology for each platform.

2. Collection/capture of newly incoming Loans per hour by platform.

3. Collection/capture of any update event for any Loan per hour and platform.

4. For originated loans, the loan's performance status can be followed.

5. Distinction of Loans having:

a. Loan Progress smaller than 100% - Loan is in the 'ask' phase, no binding contract between two parties => indicates the 'ask' volume in the market.

b. Loan Progress equals 100% - Loan is an active, binding legal contract between two parties => gives the loan/credit volume in the market.

According to an embodiment, the present method/system 100 includes incorporating a method for identifying credit risk having the objective of identifying explanatory variables for predicting if a loan is likely to be repaid or not.

The following Data and Subsets provide an example of the method used for predicting if a loan is likely to be repaid of not, according to an embodiment of the present invention.

Data: XYZ Platform originated loan data, containing all loans issued between January 2010 and September 2016, with a latest loan status as of a publication date. Two subsets of loans - all having completed their life cycles, with loan status either "Fully Paid" or "Charged Off - having been analysed.

Subset 1: Three and five year loans issued between January 2010 and November 2011 (30986 loans, 15% defaulted), Subset 2: Three year loans issued between January 2010 and December 2013 (166267 loans, 12% defaulted).

Model: A logistic regression model on loan status as dependent variable. Different subsets of independent variables have been built from the following attributes (as shown in Table 1):

Table 1

Result: So far, no attribute subset resulted in a model that calculates default probabilities matching the observed defaults in the originated loan data. None of the attributes seem to have much influence on loan status. To further analyse this issue, correlations between loan status and several attributes - such as "dti" (debt-to-income ratio) - have been calculated. For example, the correlation between dti and loan status in data subset 2 is only 0.09, which is very low.

Explanation: XYZ Platform has already used these attributes to differentiate between "good" and "bad" loans, where "good" loans are the ones they originated; they declined about 90% of all loan applications. So the data we analyse contain only the "Top 10%", exemplified by the debt-to-income ratio that remains below 35% in all 2010 to 2013 loans. In the declined loans data, for the same time interval, we find more than 200,000 dti values higher than 40% and up to 1000%. (Debt-to-income ratio is the only attribute in the declined loans data set that can be compared with originated loans.)

So it seems that other attributes would be needed to explain the defaults in the originated loans. These could be, for example, indicators related to health or unemployment risks. Another example is provided below:

Estimators for a parameter subset, using a sample of 5000 loans (4300 fully paid, 700 charged off), using R (as shown in Table 2):

Table 2

Resulting default probabilities by quartiles and corresponding observed defaults, applied to complete dataset of 30,086 loans (26,636 fully paid / 4,350 charged off) (as shown in Table 3):

For comparison: Default probabilities by quartiles and corresponding observed defaults from a bank's dataset of 300 loans (255 fully paid / 45 charged off) (as shown in Table 4):

Referring now to FIG. 2, a screenshot of a login screen for a digital application for analyzing crowdfunding platforms is illustratively depicted, in accordance with an embodiment of the present invention.

According to an embodiment, one or more of the steps and/or functions as shown and described in FIG. 1 may be completed using the digital application. According to an

embodiment, the digital application is capable of being run on an electronic device such as, but not limited to, a desktop computer, a laptop computer, a tablet computer, a smartphone, and/or any other suitable electronic device. According to an embodiment, one or more electronic devices are connected via a server via a wired and/or wireless connection. According to an embodiment, a memory may be coupled to the electronic device and/or the server for storing one or more pieces of data and/or the digital application.

According to an embodiment, the login screen for the digital application enables users to input login credentials (e.g., a username, a password, etc.) and a specific technology platform.

Referring now to FIG. 3, a screenshot of an alert system configuration screen for the digital application for analyzing crowdfunding platforms is illustratively depicted, in accordance with an embodiment of the present invention.

According to an embodiment, the user is able to configure the digital application to send alerts to the user. According to an embodiment, the configuration includes inputting information for the platform. This information may include, e.g., an address, a region, a range of loans outstanding, a legal maximum borrowing limit, an address (digital or physical) to which to send alerts, and/or any other suitable information. Referring now to FIG. 4, a screenshot for setting up a user account for the digital application for analyzing crowdfunding platforms is illustratively depicted, in accordance with an embodiment of the present invention.

According to an embodiment, the user account configuration includes inputting identifiable information including, e.g., name, login credentials, e-mail address, and/or any other suitable information. According to an embodiment, more than one user account can be configured.

Referring now to FIGs. 5-6, screenshots for configuring alerts for the digital application for analyzing crowdfunding platforms is illustratively depicted, in accordance with various embodiments of the present invention.

According to an embodiment, users are able to configure alerts for a particular platform (FIG. 5) or for all platforms (FIG. 6). According to an embodiment, the configuration includes setting a legal maximum borrowing limit, setting up to receive alerts when actual borrowing reaches a certain amount or percentage of the maximum borrowing amount, setting up to receive alerts when potential borrowing reaches a certain amount or a certain percentage of the maximum borrowing amount, and at what interview alerts are to be received. According to an embodiment, the user may also configure alerts such that users stop receiving alerts from a customer the platform has lent money to until the customer requests a new loan.

Referring now to FIG. 7, a screenshot of a profile in a platform, using the digital application for analyzing crowdfunding platforms is illustratively depicted, in accordance with an embodiment of the present invention.

According to an embodiment, the profile includes identifiable information pertaining to the identity associated with the profile, such as, e.g., name, address, region, range of loans outstanding, legal maximum borrowing limit, and an address (digital or physical) to which alerts are to be sent.

Referring now to FIG. 8, a screenshot of alerts for a platform, using the digital application for analyzing crowdfunding platforms is illustratively depicted, in accordance with an embodiment of the present invention.

According to an embodiment, alerts are organized by date received and are listed and the borrower's unique identifier listed in the alert. According to an embodiment, the user is able to search for alerts according to a specific timeframe.

Systems, Devices and Operating Systems

Typically, a user or users, which may be people or groups of users and/or other systems, may engage information technology systems (e.g., computers) to facilitate operation of the system and information processing. In turn, computers employ processors to process information and such processors may be referred to as central processing units (CPU). One form of processor is referred to as a microprocessor. CPUs use communicative circuits to pass binary encoded signals acting as instructions to enable various operations. These instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory (e.g., registers, cache memory, random access memory, etc.). Such communicative instructions may be stored and/or transmitted in batches (e.g., batches of instructions) as programs and/or data components to facilitate desired operations. These stored instruction codes, e.g., programs, may engage the CPU circuit components and other motherboard and/or system components to perform desired operations. One type of program is a computer operating system, which, may be executed by CPU on a computer; the operating system enables and facilitates users to access and operate computer information technology and resources. Some resources that may be employed in information technology systems include: input and output mechanisms through which data may pass into and out of a computer; memory storage into which data may be saved; and processors by which information may be processed. These information technology systems may be used to collect data for later retrieval, analysis, and manipulation, which may be facilitated through a database program. These information technology systems provide interfaces that allow users to access and operate various system components.

In one embodiment, the present invention may be connected to and/or communicate with entities such as, but not limited to: one or more users from user input devices; peripheral devices; an optional cryptographic processor device; and/or a communications network. For example, the present invention may be connected to and/or communicate with users, operating client device(s), including, but not limited to, personal computer(s), server(s) and/or various mobile device(s) including, but not limited to, cellular telephone(s), smartphone(s) (e.g., iPhone®, Blackberry®, Android OS-based phones etc.), tablet computer(s) (e.g., Apple iPad™, HP Slate™, Motorola Xoom™, etc.), eBook reader(s) (e.g., Amazon Kindle™, Barnes and Noble's Nook™ eReader, etc.), laptop computer(s), notebook(s), netbook(s), gaming console(s) (e.g., XBOX Live™, Nintendo® DS, Sony PlayStation® Portable, etc.), portable scanner(s) and/or the like.

Networks are commonly thought to comprise the interconnection and interoperation of clients, servers, and intermediary nodes in a graph topology. It should be noted that the term "server" as used throughout this application refers generally to a computer, other device, program, or combination thereof that processes and responds to the requests of remote users across a communications network. Servers serve their information to requesting "clients." The term "client" as used herein refers generally to a computer, program, other device, user and/or combination thereof that is capable of processing and making requests and obtaining and processing any responses from servers across a communications network. A computer, other device, program, or combination thereof that facilitates, processes information and requests, and/or furthers the passage of information from a source user to a destination user is commonly referred to as a "node." Networks are generally thought to facilitate the transfer of information from source points to destinations. A node specifically tasked with furthering the passage of information from a source to a destination is commonly called a "router." There are many forms of networks such as Local Area Networks (LANs), Pico networks, Wide Area Networks (WANs), Wireless Networks (WLANs), etc. For example, the Internet is generally accepted as being an interconnection of a multitude of networks whereby remote clients and servers may access and interoperate with one another.

The present invention may be based on computer systems that may comprise, but are not limited to, components such as: a computer systemization connected to memory.

Computer Systemization

A computer systemization may comprise a clock, central processing unit ("CPU(s)" and/or "processor(s)" (these terms are used interchangeable throughout the disclosure unless noted to the contrary)), a memory (e.g., a read only memory (ROM), a random access memory (RAM), etc.), and/or an interface bus, and most frequently, although not necessarily, are all interconnected and/or communicating through a system bus on one or more (mother)board(s) having conductive and/or otherwise transportive circuit pathways through which instructions (e.g., binary encoded signals) may travel to effect communications, operations, storage, etc. Optionally, the computer systemization may be connected to an internal power source; e.g., optionally the power source may be internal. Optionally, a cryptographic processor and/or transceivers (e.g., ICs) may be connected to the system bus. In another embodiment, the cryptographic processor and/or transceivers may be connected as either internal and/or external peripheral devices via the interface bus I/O. In turn, the transceivers may be connected to antenna(s), thereby effectuating wireless transmission and reception of various communication and/or sensor protocols; for example the antenna(s) may connect to: a Texas Instruments WiLink WL1283 transceiver chip (e.g., providing 802.11η, Bluetooth 3.0, FM, global positioning system (GPS) (thereby allowing the controller of the present invention to determine its location));

Broadcom BCM4329FKUBG transceiver chip (e.g., providing 802.11η, Bluetooth 2.1 + EDR, FM, etc.); a Broadcom BCM4750IUB8 receiver chip (e.g., GPS); an Infineon Technologies X- Gold 618-PMB9800 (e.g., providing 2G/3G HSDPA/HSUPA communications); and/or the like. The system clock typically has a crystal oscillator and generates a base signal through the computer systemization' s circuit pathways. The clock is typically coupled to the system bus and various clock multipliers that will increase or decrease the base operating frequency for other components interconnected in the computer systemization. The clock and various components in a computer systemization drive signals embodying information throughout the system. Such transmission and reception of instructions embodying information throughout a computer systemization may be commonly referred to as communications. These communicative instructions may further be transmitted, received, and the cause of return and/or reply communications beyond the instant computer systemization to: communications networks, input devices, other computer systemizations, peripheral devices, and/or the like. Of course, any of the above components may be connected directly to one another, connected to the CPU, and/or organized in numerous variations employed as exemplified by various computer systems.

The CPU comprises at least one high-speed data processor adequate to execute program components for executing user and/or system-generated requests. Often, the processors themselves will incorporate various specialized processing units, such as, but not limited to: integrated system (bus) controllers, memory management control units, floating point units, and even specialized processing sub-units like graphics processing units, digital signal processing units, and/or the like. Additionally, processors may include internal fast access addressable memory, and be capable of mapping and addressing memory beyond the processor itself; internal memory may include, but is not limited to: fast registers, various levels of cache memory (e.g., level 1, 2, 3, etc.), RAM, etc. The processor may access this memory through the use of a memory address space that is accessible via instruction address, which the processor can construct and decode allowing it to access a circuit path to a specific memory address space having a memory state. The CPU may be a microprocessor such as: AMD's Athlon, Duron and/or Opteron; ARM's application, embedded and secure processors; IBM and/or Motorola's DragonBall and PowerPC; IBM's and Sony's Cell processor; Intel's Celeron, Core (2) Duo, Itanium, Pentium, Xeon, and/or XScale; and/or the like processor(s). The CPU interacts with memory through instruction passing through conductive and/or transportive conduits (e.g., (printed) electronic and/or optic circuits) to execute stored instructions (i.e., program code) according to conventional data processing techniques. Such instruction passing facilitates communication within the present invention and beyond through various interfaces. Should processing requirements dictate a greater amount speed and/or capacity, distributed processors (e.g., Distributed embodiments of the present invention), mainframe, multi-core, parallel, and/or super-computer architectures may similarly be employed. Alternatively, should deployment requirements dictate greater portability, smaller Personal Digital Assistants (PDAs) may be employed.

Depending on the particular implementation, features of the present invention may be achieved by implementing a microcontroller such as CAST' S R8051XC2 microcontroller; Intel's MCS 51 (i.e., 8051 microcontroller); and/or the like. Also, to implement certain features of the various embodiments, some feature implementations may rely on embedded components, such as: Application-Specific Integrated Circuit ("ASIC"), Digital Signal Processing ("DSP"), Field Programmable Gate Array ("FPGA"), and/or the like embedded technology. For example, any of the component collection (distributed or otherwise) and/or features of the present invention may be implemented via the microprocessor and/or via embedded components; e.g., via ASIC, coprocessor, DSP, FPGA, and/or the like. Alternately, some implementations of the present invention may be implemented with embedded components that are configured and used to achieve a variety of features or signal processing.

Depending on the particular implementation, the embedded components may include software solutions, hardware solutions, and/or some combination of both hardware/software solutions. For example, features of the present invention discussed herein may be achieved through implementing FPGAs, which are a semiconductor devices containing programmable logic components called "logic blocks", and programmable interconnects, such as the high performance FPGA Virtex series and/or the low cost Spartan series manufactured by Xilinx. Logic blocks and interconnects can be programmed by the customer or designer, after the FPGA is manufactured, to implement any of the features of the present invention. A hierarchy of programmable interconnects allow logic blocks to be interconnected as needed by the system designer/administrator of the present invention, somewhat like a one-chip programmable breadboard. An FPGA's logic blocks can be programmed to perform the function of basic logic gates such as AND, and XOR, or more complex combinational functions such as decoders or simple mathematical functions. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory. In some circumstances, the present invention may be developed on regular FPGAs and then migrated into a fixed version that more resembles ASIC implementations. Alternate or coordinating implementations may migrate features of the controller of the present invention to a final ASIC instead of or in addition to FPGAs. Depending on the implementation all of the aforementioned embedded components and microprocessors may be considered the "CPU" and/or "processor" for the present invention.

Power Source

The power source may be of any standard form for powering small electronic circuit board devices such as the following power cells: alkaline, lithium hydride, lithium ion, lithium polymer, nickel cadmium, solar cells, and/or the like. Other types of AC or DC power sources may be used as well. In the case of solar cells, in one embodiment, the case provides an aperture through which the solar cell may capture photonic energy. The power cell is connected to at least one of the interconnected subsequent components of the present invention thereby providing an electric current to all subsequent components. In one example, the power source is connected to the system bus component. In an alternative embodiment, an outside power source is provided through a connection across the I/O interface. For example, a USB and/or IEEE 1394 connection carries both data and power across the connection and is therefore a suitable source of power.

Interface Adapters Interface bus(ses) may accept, connect, and/or communicate to a number of interface adapters, conventionally although not necessarily in the form of adapter cards, such as but not limited to: input output interfaces (I/O), storage interfaces, network interfaces, and/or the like. Optionally, cryptographic processor interfaces similarly may be connected to the interface bus. The interface bus provides for the communications of interface adapters with one another as well as with other components of the computer systemization. Interface adapters are adapted for a compatible interface bus. Interface adapters conventionally connect to the interface bus via a slot architecture. Conventional slot architectures may be employed, such as, but not limited to:

Accelerated Graphics Port (AGP), Card Bus, (Extended) Industry Standard Architecture

((E)ISA), Micro Channel Architecture (MCA), NuBus, Peripheral Component Interconnect (Extended) (PCI(X)), PCI Express, Personal Computer Memory Card International Association (PCMCIA), and/or the like.

Storage interfaces may accept, communicate, and/or connect to a number of storage devices such as, but not limited to: storage devices, removable disc devices, and/or the like. Storage interfaces may employ connection protocols such as, but not limited to: (Ultra) (Serial) Advanced Technology Attachment (Packet Interface) ((Ultra) (Serial) ATA(PI)), (Enhanced) Integrated Drive Electronics ((E)IDE), Institute of Electrical and Electronics Engineers (IEEE) 1394, fiber channel, Small Computer Systems Interface (SCSI), Universal Serial Bus (USB), and/or the like.

Network interfaces may accept, communicate, and/or connect to a communications network. Through a communications network, the controller of the present invention is accessible through remote clients (e.g., computers with web browsers) by users. Network interfaces may employ connection protocols such as, but not limited to: direct connect, Ethernet (thick, thin, twisted pair 10/100/1000 Base T, and/or the like), Token Ring, wireless connection such as IEEE 802.11a-x, and/or the like. Should processing requirements dictate a greater amount speed and/or capacity, distributed network controllers (e.g., Distributed embodiments of the present invention), architectures may similarly be employed to pool, load balance, and/or otherwise increase the communicative bandwidth required by the controller of the present invention. A communications network may be any one and/or the combination of the following: a direct interconnection; the Internet; a Local Area Network (LAN); a Metropolitan Area Network (MAN); an Operating Missions as Nodes on the Internet (OMNI); a secured custom connection; a Wide Area Network (WAN); a wireless network (e.g., employing protocols such as, but not limited to a Wireless Application Protocol (WAP), I-mode, and/or the like); and/or the like. A network interface may be regarded as a specialized form of an input output interface. Further, multiple network interfaces may be used to engage with various communications network types. For example, multiple network interfaces may be employed to allow for the communication over broadcast, multicast, and/or unicast networks.

Input Output interfaces (I/O) may accept, communicate, and/or connect to user input devices, peripheral devices, cryptographic processor devices, and/or the like. I/O may employ connection protocols such as, but not limited to: audio: analog, digital, monaural, RCA, stereo, and/or the like; data: Apple Desktop Bus (ADB), IEEE 1394a-b, serial, universal serial bus (USB); infrared; joystick; keyboard; midi; optical; PC AT; PS/2; parallel; radio; video interface: Apple Desktop Connector (ADC), BNC, coaxial, component, composite, digital, Digital Visual Interface (DVI), high-definition multimedia interface (HDMI), RCA, RF antennae, S-Video, VGA, and/or the like; wireless transceivers: 802.11a/b/g/n/x; Bluetooth; cellular (e.g., code division multiple access (CDMA), high speed packet access (HSPA(+)), high-speed downlink packet access (HSDPA), global system for mobile communications (GSM), long term evolution (LTE), WiMax, etc.); and/or the like. One typical output device may include a video display, which typically comprises a Cathode Ray Tube (CRT) or Liquid Crystal Display (LCD) based monitor with an interface (e.g., DVI circuitry and cable) that accepts signals from a video interface, may be used. The video interface composites information generated by a computer sy stemization and generates video signals based on the composited information in a video memory frame. Another output device is a television set, which accepts signals from a video interface. Typically, the video interface provides the composited video information through a video connection interface that accepts a video display interface (e.g., an RCA composite video connector accepting an RCA composite video cable; a DVI connector accepting a DVI display cable, etc.).

User input devices often are a type of peripheral device (see below) and may include: card readers, dongles, finger print readers, gloves, graphics tablets, joysticks, keyboards, microphones, mouse (mice), remote controls, retina readers, touch screens (e.g., capacitive, resistive, etc.), trackballs, trackpads, sensors (e.g., accelerometers, ambient light, GPS, gyroscopes, proximity, etc.), styluses, and/or the like.

Peripheral devices may be external, internal and/or part of the controller of the present invention. Peripheral devices may also include, for example, an antenna, audio devices (e.g., line-in, line-out, microphone input, speakers, etc.), cameras (e.g., still, video, webcam, etc.), drive motors, lighting, video monitors and/or the like.

Cryptographic units such as, but not limited to, microcontrollers, processors, interfaces, and/or devices may be attached, and/or communicate with the controller of the present invention. A MC68HC16 microcontroller, manufactured by Motorola Inc., may be used for and/or within cryptographic units. The MC68HC16 microcontroller utilizes a 16-bit multiply-and-accumulate instruction in the 16 MHz configuration and requires less than one second to perform a 512-bit RSA private key operation. Cryptographic units support the authentication of communications from interacting agents, as well as allowing for anonymous transactions. Cryptographic units may also be configured as part of CPU. Equivalent microcontrollers and/or processors may also be used. Other commercially available specialized cryptographic processors include: the Broadcom's CryptoNetX and other Security Processors; nCipher's nShield, SafeNet' s Luna PCI (e.g., 7100) series; Semaphore Communications' 40 MHz Roadrunner 184; Sun's Cryptographic Accelerators (e.g., Accelerator 6000 PCIe Board, Accelerator 500 Daughtercard); Via Nano Processor (e.g., L2100, L2200, U2400) line, which is capable of performing 500+ MB/s of cryptographic instructions; VLSI Technology's 33 MHz 6868; and/or the like.

Memory

Generally, any mechanization and/or embodiment allowing a processor to affect the storage and/or retrieval of information is regarded as memory. However, memory is a fungible technology and resource, thus, any number of memory embodiments may be employed in lieu of or in concert with one another. It is to be understood that the controller of the present invention and/or a computer systemization may employ various forms of memory. For example, a computer systemization may be configured wherein the functionality of on-chip CPU memory (e.g., registers), RAM, ROM, and any other storage devices are provided by a paper punch tape or paper punch card mechanism; of course such an embodiment would result in an extremely slow rate of operation. In a typical configuration, memory will include ROM, RAM, and a storage device. A storage device may be any conventional computer system storage. Storage devices may include a drum; a (fixed and/or removable) magnetic disk drive; a magneto-optical drive; an optical drive (i.e., Blueray, CD ROM/RAM/Recordable (R)/Re Writable (RW), DVD R/RW, HD DVD R/RW etc.); an array of devices (e.g., Redundant Array of Independent Disks (RAID)); solid state memory devices (USB memory, solid state drives (SSD), etc.); other processor-readable storage mediums; and/or other devices of the like. Thus, a computer systemization generally requires and makes use of memory.

Component Collection

The memory may contain a collection of program and/or database components and/or data such as, but not limited to: operating system component(s) (operating system); information server component(s) (information server); user interface component(s) (user interface); Web browser component(s) (Web browser); database(s); mail server component(s); mail client component(s); cryptographic server component(s) (cryptographic server) and/or the like (i.e., collectively a component collection). These components may be stored and accessed from the storage devices and/or from storage devices accessible through an interface bus. Although non- conventional program components such as those in the component collection, typically, are stored in a local storage device, they may also be loaded and/or stored in memory such as: peripheral devices, RAM, remote storage facilities through a communications network, ROM, various forms of memory, and/or the like.

Operating System

The operating system component is an executable program component facilitating the operation of the controller of the present invention. Typically, the operating system facilitates access of I/O, network interfaces, peripheral devices, storage devices, and/or the like. The operating system may be a highly fault tolerant, scalable, and secure system such as: Apple Macintosh OS X (Server); AT&T Plan 9; Be OS; Unix and Unix-like system distributions (such as AT&T's UNIX; Berkley Software Distribution (BSD) variations such as FreeBSD, NetBSD, OpenBSD, and/or the like; Linux distributions such as Red Hat, Ubuntu, and/or the like); and/or the like operating systems. However, more limited and/or less secure operating systems also may be employed such as Apple Macintosh OS, IBM OS/2, Microsoft DOS, Microsoft Windows 2000/2003/3.1/95/98/CE/Millennium/NT/Vista/XP (Server), Palm OS, and/or the like. The operating system may be one specifically optimized to be run on a mobile computing device, such as iOS, Android, Windows Phone, Tizen, Symbian, and/or the like. An operating system may communicate to and/or with other components in a component collection, including itself, and/or the like. Most frequently, the operating system communicates with other program components, user interfaces, and/or the like. For example, the operating system may contain, communicate, generate, obtain, and/or provide program component, system, user, and/or data communications, requests, and/or responses. The operating system, once executed by the CPU, may enable the interaction with communications networks, data, I/O, peripheral devices, program components, memory, user input devices, and/or the like. The operating system may provide communications protocols that allow the controller of the present invention to communicate with other entities through a communications network. Various communication protocols may be used by the controller of the present invention as a subcarrier transport mechanism for interaction, such as, but not limited to: multicast, TCP/IP, UDP, unicast, and/or the like.

Information Server

An information server component is a stored program component that is executed by a CPU. The information server may be a conventional Internet information server such as, but not limited to Apache Software Foundation's Apache, Microsoft's Internet Information Server, and/or the like. The information server may allow for the execution of program components through facilities such as Active Server Page (ASP), ActiveX, (ANSI) (Objective-) C (++), C# and/or .NET, Common Gateway Interface (CGI) scripts, dynamic (D) hypertext markup language (HTML), FLASH, Java, JavaScript, Practical Extraction Report Language (PERL), Hypertext Pre-Processor (PHP), pipes, Python, wireless application protocol (WAP),

WebObjects, and/or the like. The information server may support secure communications protocols such as, but not limited to, File Transfer Protocol (FTP); HyperText Transfer Protocol (HTTP); Secure Hypertext Transfer Protocol (HTTPS), Secure Socket Layer (SSL), messaging protocols (e.g., America Online (AOL) Instant Messenger (AIM), Application Exchange (APEX), ICQ, Internet Relay Chat (IRC), Microsoft Network (MSN) Messenger Service, Presence and Instant Messaging Protocol (PRIM), Internet Engineering Task Force's (IETF's) Session Initiation Protocol (SIP), SIP for Instant Messaging and Presence Leveraging Extensions (SIMPLE), open XML-based Extensible Messaging and Presence Protocol (XMPP) (i.e., Jabber or Open Mobile Alliance's (OMA's) Instant Messaging and Presence Service (IMPS)), Yahoo! Instant Messenger Service, and/or the like. The information server provides results in the form of Web pages to Web browsers, and allows for the manipulated generation of the Web pages through interaction with other program components. After a Domain Name System (DNS) resolution portion of an HTTP request is resolved to a particular information server, the information server resolves requests for information at specified locations on the controller of the present invention based on the remainder of the HTTP request. For example, a request such as http://123.124.125.126/mylnformation.html might have the IP portion of the request

"123.124.125.126" resolved by a DNS server to an information server at that IP address; that information server might in turn further parse the http request for the "/mylnformation.html" portion of the request and resolve it to a location in memory containing the information

"myInformation.html." Additionally, other information serving protocols may be employed across various ports, e.g., FTP communications across port, and/or the like. An information server may communicate to and/or with other components in a component collection, including itself, and/or facilities of the like. Most frequently, the information server communicates with the database of the present invention, operating systems, other program components, user interfaces, Web browsers, and/or the like.

Access to the database of the present invention may be achieved through a number of database bridge mechanisms such as through scripting languages as enumerated below (e.g., CGI) and through inter-application communication channels as enumerated below (e.g., CORBA, WebObjects, etc.). Any data requests through a Web browser are parsed through the bridge mechanism into appropriate grammars as required by the present invention. In one embodiment, the information server would provide a Web form accessible by a Web browser. Entries made into supplied fields in the Web form are tagged as having been entered into the particular fields, and parsed as such. The entered terms are then passed along with the field tags, which act to instruct the parser to generate queries directed to appropriate tables and/or fields. In one embodiment, the parser may generate queries in standard SQL by instantiating a search string with the proper join/select commands based on the tagged text entries, wherein the resulting command is provided over the bridge mechanism to the present invention as a query. Upon generating query results from the query, the results are passed over the bridge mechanism, and may be parsed for formatting and generation of a new results Web page by the bridge mechanism. Such a new results Web page is then provided to the information server, which may supply it to the requesting Web browser. Also, an information server may contain, communicate, generate, obtain, and/or provide program component, system, user, and/or data communications, requests, and/or responses.

User Interface

Computer interfaces in some respects are similar to automobile operation interfaces. Automobile operation interface elements such as steering wheels, gearshifts, and speedometers facilitate the access, operation, and display of automobile resources, and status. Computer interaction interface elements such as check boxes, cursors, menus, scrollers, and windows (collectively and commonly referred to as widgets) similarly facilitate the access, capabilities, operation, and display of data and computer hardware and operating system resources, and status. Operation interfaces are commonly called user interfaces. Graphical user interfaces (GUIs) such as the Apple Macintosh Operating System's Aqua, IBM's OS/2, Microsoft's Windows 2000/2003/3. l/95/98/CE/Millennium/NT/XP/Vista/7 (i.e., Aero), Unix's X- Windows (e.g., which may include additional Unix graphic interface libraries and layers such as K Desktop Environment (KDE), mythTV and GNU Network Object Model Environment (GNOME)), web interface libraries (e.g., ActiveX, AJAX, (D)HTML, FLASH, Java, JavaScript, etc. interface libraries such as, but not limited to, Dojo, jQuery(UI), MooTools, Prototype, script. aculo. us, SWFObject, Yahoo! User Interface, any of which may be used and) provide a baseline and means of accessing and displaying information graphically to users.

A user interface component is a stored program component that is executed by a CPU. The user interface may be a conventional graphic user interface as provided by, with, and/or atop operating systems and/or operating environments such as already discussed. The user interface may allow for the display, execution, interaction, manipulation, and/or operation of program components and/or system facilities through textual and/or graphical facilities. The user interface provides a facility through which users may affect, interact, and/or operate a computer system. A user interface may communicate to and/or with other components in a component collection, including itself, and/or facilities of the like. Most frequently, the user interface communicates with operating systems, other program components, and/or the like. The user interface may contain, communicate, generate, obtain, and/or provide program component, system, user, and/or data communications, requests, and/or responses.

Web Browser

A Web browser component is a stored program component that is executed by a CPU. The Web browser may be a conventional hypertext viewing application such as Microsoft Internet Explorer or Netscape Navigator. Secure Web browsing may be supplied with 128bit (or greater) encryption by way of HTTPS, SSL, and/or the like. Web browsers allowing for the execution of program components through facilities such as ActiveX, AJAX, (D)HTML, FLASH, Java, JavaScript, web browser plug-in APIs (e.g., FireFox, Safari Plug-in, and/or the like APIs), and/or the like. Web browsers and like information access tools may be integrated into PDAs, cellular telephones, and/or other mobile devices. A Web browser may communicate to and/or with other components in a component collection, including itself, and/or facilities of the like. Most frequently, the Web browser communicates with information servers, operating systems, integrated program components (e.g., plug-ins), and/or the like; e.g., it may contain, communicate, generate, obtain, and/or provide program component, system, user, and/or data communications, requests, and/or responses. Of course, in place of a Web browser and information server, a combined application may be developed to perform similar functions of both. The combined application would similarly affect the obtaining and the provision of information to users, user agents, and/or the like from the enabled nodes of the present invention. The combined application may be nugatory on systems employing standard Web browsers.

Mail Server

A mail server component is a stored program component that is executed by a CPU. The mail server may be a conventional Internet mail server such as, but not limited to sendmail, Microsoft Exchange, and/or the like. The mail server may allow for the execution of program components through facilities such as ASP, ActiveX, (ANSI) (Objective-) C (++), C# and/or .NET, CGI scripts, Java, JavaScript, PERL, PHP, pipes, Python, WebObjects, and/or the like. The mail server may support communications protocols such as, but not limited to: Internet message access protocol (IMAP), Messaging Application Programming Interface

(MAPI)/Microsoft Exchange, post office protocol (POP3), simple mail transfer protocol

(SMTP), and/or the like. The mail server can route, forward, and process incoming and outgoing mail messages that have been sent, relayed and/or otherwise traversing through and/or to the present invention.

Access to the mail of the present invention may be achieved through a number of APIs offered by the individual Web server components and/or the operating system.

Also, a mail server may contain, communicate, generate, obtain, and/or provide program component, system, user, and/or data communications, requests, information, and/or responses.

Mail Client

A mail client component is a stored program component that is executed by a CPU. The mail client may be a conventional mail viewing application such as Apple Mail, Microsoft Entourage, Microsoft Outlook, Microsoft Outlook Express, Mozilla, Thunderbird, and/or the like. Mail clients may support a number of transfer protocols, such as: IMAP, Microsoft Exchange, POP3, SMTP, and/or the like. A mail client may communicate to and/or with other components in a component collection, including itself, and/or facilities of the like. Most frequently, the mail client communicates with mail servers, operating systems, other mail clients, and/or the like; e.g., it may contain, communicate, generate, obtain, and/or provide program component, system, user, and/or data communications, requests, information, and/or responses. Generally, the mail client provides a facility to compose and transmit electronic mail messages.

Cryptographic Server

A cryptographic server component is a stored program component that is executed by a CPU, cryptographic processor, cryptographic processor interface, cryptographic processor device, and/or the like. Cryptographic processor interfaces will allow for expedition of encryption and/or decryption requests by the cryptographic component; however, the

cryptographic component, alternatively, may run on a conventional CPU. The cryptographic component allows for the encryption and/or decryption of provided data. The cryptographic component allows for both symmetric and asymmetric (e.g., Pretty Good Protection (PGP)) encryption and/or decryption. The cryptographic component may employ cryptographic techniques such as, but not limited to: digital certificates (e.g., X.509 authentication framework), digital signatures, dual signatures, enveloping, password access protection, public key management, and/or the like. The cryptographic component will facilitate numerous (encryption and/or decryption) security protocols such as, but not limited to: checksum, Data Encryption Standard (DES), Elliptical Curve Encryption (ECC), International Data Encryption Algorithm (IDEA), Message Digest 5 (MD5, which is a one way hash function), passwords, Rivest Cipher (RC5), Rijndael, RSA (which is an Internet encryption and authentication system that uses an algorithm developed in 1977 by Ron Rivest, Adi Shamir, and Leonard Adleman), Secure Hash Algorithm (SHA), Secure Socket Layer (SSL), Secure Hypertext Transfer Protocol (HTTPS), and/or the like. Employing such encryption security protocols, the present invention may encrypt all incoming and/or outgoing communications and may serve as node within a virtual private network (VPN) with a wider communications network. The cryptographic component facilitates the process of "security authorization" whereby access to a resource is inhibited by a security protocol wherein the cryptographic component effects authorized access to the secured resource. In addition, the cryptographic component may provide unique identifiers of content, e.g., employing and MD5 hash to obtain a unique signature for an digital audio file. A cryptographic component may communicate to and/or with other components in a component collection, including itself, and/or facilities of the like. The cryptographic component supports encryption schemes allowing for the secure transmission of information across a communications network to enable the component of the present invention to engage in secure transactions if so desired. The cryptographic component facilitates the secure accessing of resources on the present invention and facilitates the access of secured resources on remote systems; i.e., it may act as a client and/or server of secured resources. Most frequently, the cryptographic component communicates with information servers, operating systems, other program components, and/or the like. The cryptographic component may contain, communicate, generate, obtain, and/or provide program component, system, user, and/or data communications, requests, and/or responses.

The Database of the Present Invention

The database component of the present invention may be embodied in a database and its stored data. The database is a stored program component, which is executed by the CPU; the stored program component portion configuring the CPU to process the stored data. The database may be a conventional, fault tolerant, relational, scalable, secure database such as Oracle or Sybase. Relational databases are an extension of a flat file. Relational databases consist of a series of related tables. The tables are interconnected via a key field. Use of the key field allows the combination of the tables by indexing against the key field; i.e., the key fields act as dimensional pivot points for combining information from various tables. Relationships generally identify links maintained between tables by matching primary keys. Primary keys represent fields that uniquely identify the rows of a table in a relational database. More precisely, they uniquely identify rows of a table on the "one" side of a one-to-many relationship.

Alternatively, the database of the present invention may be implemented using various standard data-structures, such as an array, hash, (linked) list, struct, structured text file (e.g., XML), table, and/or the like. Such data- structures may be stored in memory and/or in

(structured) files. In another alternative, an object-oriented database may be used, such as Frontier, ObjectStore, Poet, Zope, and/or the like. Object databases can include a number of object collections that are grouped and/or linked together by common attributes; they may be related to other object collections by some common attributes. Object-oriented databases perform similarly to relational databases with the exception that objects are not just pieces of data but may have other types of functionality encapsulated within a given object. If the database of the present invention is implemented as a data- structure, the use of the database of the present invention may be integrated into another component such as the component of the present invention. Also, the database may be implemented as a mix of data structures, objects, and relational structures. Databases may be consolidated and/or distributed in countless variations through standard data processing techniques. Portions of databases, e.g., tables, may be exported and/or imported and thus decentralized and/or integrated. In one embodiment, the database component includes several tables. A Users (e.g., operators and physicians) table may include fields such as, but not limited to: user_id, ssn, dob, first_name, last_name, age, state, address_firstline, address_secondline, zipcode, devices_list, contact_info, contactjype, alt_contact_info, alt_contact_type, and/or the like to refer to any type of enterable data or selections discussed herein. The Users table may support and/or track multiple entity accounts. A Clients table may include fields such as, but not limited to: user_id, client_id, client_ip, clientjype, client_model, operating_system, os_version, app_installed_flag, and/or the like. An Apps table may include fields such as, but not limited to: app_ID, app_name, appjype, OS_compatibilities_list, version, timestamp, developer_ID, and/or the like. A beverages table including, for example, heat capacities and other useful parameters of different beverages, such as depending on size beverage_name, beverage_size, desired_coolingtemp, cooling_time, favorite_drinker, number_of_beverages, current_beverage_temperature, current_ambient_temperature, and/or the like. A Parameter table may include fields including the foregoing fields, or additional ones such as cool_start_time, cool_preset, cooling_rate, and/or the like. A Cool Routines table may include a plurality of cooling sequences may include fields such as, but not limited to: sequence_type, sequence_id, flow_rate, avg_water_temp,

cooling_time, pump_setting, pump_speed, pump_pressure, power_level,

temperature_sensor_id_number, temperature_sensor_location, and/or the like.

In one embodiment, user programs may contain various user interface primitives, which may serve to update the platform of the present invention. Also, various accounts may require custom database tables depending upon the environments and the types of clients the system of the present invention may need to serve. It should be noted that any unique fields may be designated as a key field throughout. In an alternative embodiment, these tables have been decentralized into their own databases and their respective database controllers (i.e., individual database controllers for each of the above tables). Employing standard data processing techniques, one may further distribute the databases over several computer systemizations and/or storage devices. Similarly, configurations of the decentralized database controllers may be varied by consolidating and/or distributing the various database components. The system of the present invention may be configured to keep track of various settings, inputs, and parameters via database controllers.

When introducing elements of the present disclosure or the embodiment(s) thereof, the articles "a," "an," and "the" are intended to mean that there are one or more of the elements. Similarly, the adjective "another," when used to introduce an element, is intended to mean one or more elements. The terms "including" and "having" are intended to be inclusive such that there may be additional elements other than the listed elements.

Although this invention has been described with a certain degree of particularity, it is to be understood that the present disclosure has been made only by way of illustration and that numerous changes in the details of construction and arrangement of parts may be resorted to without departing from the spirit and the scope of the invention.

APPENDIX I Abstract

Recent legislative changes have made it possible for companies in the United States to raise the capital they need by means of peer-to-peer marketplace lending and securities (equity and debt (e.g., peer-to-peer lending)) crowdfunding; and allows for accredited and non- accredited investors to buy and sell securities in Small Cap Private Companies andNon-Publicly Traded Funds. CrowdBureau is developing an integrated approach for addressing the challenges of this market, including the development of ratings and the creation of an online financial technology platform that provides a transparent framework for investors and creates the mechanisms for the market participants to comply with regulations and benchmark their performance. The design of the rating framework starts with data collection, consolidation, and unification of the peer - to - peer market place lending and securities (equity and debt) crowdfunding market.

APPENDIX I

CrowdBureau, LLC

IP Patent

Application

From Main Street storefronts to high tech startups, American "small and medium businesses have been responsible for creating two out of every three net new jobs over the last two decades." 1 The ability for individuals to pursue an idea, to start a company, and to grow a business is the foundation of the U.S. economy.

The Obama Administration sought to ensure the benefits of our continuing economic recovery reach all Americans through the Jumpstart Our Business Startups Act, 2012, that allows for the securities crowdfunding (equity and debt) online through an intermediary (broker-dealer or registered funding platform). It is important that consumers and small businesses have broad access to safe and affordable credit and equity facilities. Without capital formation, entrepreneurs cannot put innovative ideas into action. Without sufficient funding Americans cannot grow their businesses to create new jobs and opportunities for the next generation.

Since the launch of Kickstarter in 2009, crowdfunding has become exceedingly popular. This "democratization of fundraising" allows entrepreneurs and innovators to raise vital capital from strangers around the world, bypassing the traditional methods of fundraising fromfriends, family and investors. Kickstarter, Indiegogo and GoFundMe are familiar names accountingfor billions in rewards and donations. These crowdfunding platforms are only a small portion of a rapidly growing industry. If someone were to plan a crowdfunding campaign, they would likely turn to one of these platforms first. APPENDIX I

Staff, alumni and students at universities across the United States in turn are beginning to leverage these new mechanisms to fund tuition, projects and businesses through exclusive crowdfunding platforms hosted by their schools.

Four Types of Crowdfunding

Most of the crowdfunding platforms can be assigned to the four crowdfunding categories presented below, even though the business models sometimes differ strongly within these groups below is an overview of each. In the crowd investing category, for example, there are huge differences between the business models of dependent on which part of the JOBS Act is being leveraged. Note, that one or more models can be adopted in order to create a "graduation" model that serves as incubators throughout the lifecycle of a project orbusiness.

Crowdfunding Definitions

Crowd Donation: Funding contributions are donations that do not earn any direct measurable compensation or perk. Examples include social, charitable and cultural projects. Crowd donating can also be used to raise funds for political campaigns. For crowd donating to be successful, it is imperative that an emotional bond be established and maintained between the providers of capital and the recipients.

Crowd Rewards: Crowd rewards include creative and cultural projects as well as sport projects. However, commercial projects can also be subsumed under this category. With this type of financing, the contributor receives a perk (e.g., reward) in the form of products, works of art, or services. There are no limits to the creativity of the parties looking for funding. APPENDIX I

Crowd Investing (Equity/ Debt): Instead of financing a project, crowd investing focuses on purchasing equity (common shares) or debt (e.g., convertible notes, mini-bonds) in a company. Crowd investing also offers investors with only limited amounts to invest the opportunity to support the growth of a young company. In return, these investors receive shares in the company. These are often silent partnerships where the investors only have limited voting rights.

Crowd Lending/ Peer-to-Peer Lending: Crowd Lending mainly refers to the financing of companies or private individuals (e.g. life styles, student loans, real estate, cars, etc.) with loans (borrowed capital). In return for their loan, lenders expect a risk-adjusted return on their investment. As products and business models have evolved, the investor base for online marketplace lenders has expanded to institutional investors, hedge fund, and financial institutions.

Type of Business Models

Since securities based crowdfunding (e.g. equity) is selling shares (common stock) and based on the part of the JOBS Act UAB implements a platform per se is not required to conduct Title II and IV though many are in place to streamline and manage the process. This could be an extension (deal room) built into the overall platform as illustrated in the peer-to-peer lending models. Therefore, this section provides an overview of the primary business models in online peer-to-peer lending as well as the structures used to fund this activity.

Companies in this industry have developed two primary business models: (1) direct lenders that originate loans to hold in their own portfolios, commonly referred to as balance sheet lenders (FIG. 9); and (FIG. 10) platform lenders that partner with an issuing depository institution to originate loans and then purchase the loans for sale to investors as whole loans or by issuing securities APPENDIX I such as member-dependent notes (FIG. 10). The third business model (FIG. 11) is intended to illustrate the transfer rights and obligations in securitization.

Direct lenders that do not rely on depository institutions to originate loans are generally required to obtain licenses from each state in which they lend. Direct lenders that use state lending licenses to originate loans directly are not subject to a federal banking regulator's supervisory authority, except to the extent the lenders may be subject to CFPB supervision.

FIG. 9: Direct "Simple" Model

This model can be used for donations, rewards and equity and debt crowdfunding. The platform would be flexible to allow for more than model and levering the results from campaign A to campaign B for the same Issuer

APPENDIX I

FIG. 10: Platform Lender Model

This model uses a partner bank to originate loans that are subsequently purchased by the platform.

FIG. 11 : Transfer of rights and obligations in securitizations

This diagram is intended to illustrate only the direction of rights and obligation with the process. Many details of the securitization process such as tranching securities, creating liquidity and more are not covered below.

The basic principle of crowdfunding (debt or equity) is to match borrowers who require capital with investors/lenders who have spare capital, bypassing the role traditionally played by banks. Leveraging these developments, a lender can offer faster credit to consumers (e.g., students) and all types of emerging growth companies. Over the past ten years online marketplace lending companies have evolved from platforms connecting individual borrowers with individual lenders, to sophisticated networks featuring institutional investors, financial institution partnerships, direct lending, and securitization transactions.

APPENDIX I

An approach is to attempt a hybrid model of marketplace and balance sheet lending. It seems to us that a company buying loans to hold on its own balance sheet and also selling other loans to investors has incentives to sell offer weaker loans and keep better loans for their own balance sheet. There are also benefits to the concept of having "skin in the game" to keep both the platform and lenders honest and aligned with each other.

Market Opportunity

Current Regulatory Landscape

Comply with government compliance regimes for the capital markets through initiatives such as The Jumpstart Our Business Startups (JOBS) Act, in the United States.

Over 40 countries globally have reformed and re-regulated capital raising (e.g., 28 European Member States, China) for retail consumers and small- and medium-sized enterprises (SME's) using the Internet to buy and sell securities.

A new type of regulated intermediary called securities crowdfunding platforms, marketplace lenders, and peer-to-peer platforms are creating new types of data fromthe buying and selling of securities online (e.g., peer-to-peer loan book data).

China (People Bank of China) requires platforms to monitor borrower and issuer limits and transference of funds.

Issues Affecting Market Transparency

Investors cannot compare loans cross platforms (e.g., interest rates)

There is no standard benchmark to assess investors' or borrowers' performance

RiskAssessment

There is no standard rating system for communicating risk

There is no standard system for creating structured loan products APPENDIX I

CrowdBureau Structure

CrowdBureau is a financial technology company that is positioned to becoming the alternative rating agency for peer-to-peer lending and securities crowdfunding.

The team is comprised of dedicated, experienced financial services/banking, operations, technology, legal personnel. Which includes specialists of quantitative and qualitative due diligence team that provide daily/ weekly/ monthly/ quarterly analysis. Delivering regulatory compliance, benchmarks and risk models to assess loans andportfolios.

We will provide research, asset and risk management for clients such as banks, peer-to- peer lending platforms, fundamental investors and money managers.

APPENDIX I

Technology Development Opportunity

Globally, over 2500 platforms have started issuing consumer personal loans, small- and medium -sized business loans, real estate loans, student loans, agriculture/agribusiness loans, solar/renewable energy loans and automobile loans via web enabled lending platforms. Financial loan data are published by each lending platform as each borrower is listed on the platform seeking funding. The marketplace lender/ peer-to-peer lender updates and releases its data on a different time intervals through different medium, in different formats and across different jurisdictions.

Some platforms provide the data through the WebSocket real-time protocol (essentially pushing new loan data and events to protocol subscribers). Others through a RESTful API where a script could pull new loan data on a pre-defined interval - time series (hourly, 3-hours, daily, monthly, quarterly). Dependency of output is based on age of the peer-to-peer lending platforms, business model of the peer-to-peer lending platforms, some simply update their loan listings at the time the borrower "ask" for a loan amount, update an event when an investor lends money toward the "ask" amount, and when the loan originates (e.g., the loan is fully funded) on apublic web page, provide a CSV file for download, and other platforms provide for direct application program interface (API) for retail and institutional investors and partners.

These peer-to-peer lending platforms offer their data in different formats, including [but not limited] to JSON, line-delimited-JSON, CSV, TSV, Excel and HTML. Each format is provided with different possible encoding including [but not limited] to UTF-8, BIG5, Latin-1 and GBK. APPENDIX I

Each platform data may be in a different language (Chinese, English, Hindi, French, Spanish and more). Any numeric value may be denominated in different units and these units may be (e.g., currencies - US Dollar, Renmibi, Euro, British Pound Sterling, Rupee; and time zone) and have a different numeric range. The numeric range may include salaries (e.g., 0 - 1 million versus 0 - 1 thousand).

The problem is derived from this situation when an entity (e.g., automated or human) desires to comprehend these data in a consolidated macro to micro level, across all platforms for the peer-to-peer lending financial industry (e.g., regulators, investors). Comprehend refers to the generation of statistics and allowing a high degree of comparativeness qualitative and quantitative across platforms' data.

Solution to problem

The solution to address the problem is comprised of three-layer components working together. Illustrated in FIG. 12 is the solution for the collection, consolidation and unification. The data collection component.

The data collection component comprises a set of custom-made scripts that connect to an individual lending platform and retrieve its loan book data. Each script complies with and follow the peer-to-peer lending platform data release schedule, medium and format. Once the datafrom each platform is received, they are stored (archived) in their natural state, in real time, along with meta-data generated in a data collection SQL database. Meta-data includes: a timestamp for when the data has been received, the name of the platform, and the list of its data attributes for each borrower listing and subsequent loan origination. At this stage all platform data are saved using the same encoding (UTF-8), and the same format (JSON), but each retain its unique and verifiable data attributes keys (e.g., loan interest may be represented as "Loanlnterest" or APPENDIX I

"loan itrst"). This archived step allows for auditing of the original data footprint for compliance purposes prior to sanitization.

The data consolidation component. The data consolidation component addresses the need for transforming the data to use a common language, currency, time zone, common units and numeric ranges. The data consolidation component pulls data from the data collection component, read them and apply various transformations, such as the list of the examples below: Data in natural language (e.g., loan type/ usage, interest rate, loan amount, repayment terms, and more.) is captured in the first instance in the native language and archived for audit purpose and then translated into English. The monetary denomination type data such as loan amount, premiums and other data are captured in the native language and remain in native language for research reports and benchmarks as such due to currency fluctuations. Typically, this will not convert into US Dollar, unless required and then both denominations will be presented with date/time stamp for back testing.

Time zones are converted to the UTC time zone.

Borrowers income information, interest rate and other numeric information will be converted to use a single floating - point format (e.g., "18K" to "18000.00", "10%" to "0.1"). At this stage all data has been converted to a common format, but each platform still remains with its original and unique set of data attribute key.

All these data is pushed and stored into a queue, to be consumed by the last component. APPENDIX I

The data unification component. The data unification component reads data from a queue populated by the data collection component. Based on a mapping table, documenting for each different platform data attribute pair (e.g., platform A/attribute Y) its destination unified data attribute, this component populates a central SQL database for all platform/attribute pairs. This results in a central database storing different platform data in a new unified format, from which macro-level statistics and comparison analysis can be achieved with perfect accuracy.

Define "Perfect" - less than 1 per cent error rate.

Beneficial effects of the CrowdBureau Solution

Such a solution allows for transparency at the transaction level for loan data in nearreal- time. Normalization and standardization of data, thus allowing for the creation of industrywide comparisons, valuations, pricing activity and statistics generation, across platforms, across jurisdictions and regional settings.

Loan Data Examples. Include comparing the average interest rate of a platform A in jurisdiction Y, to the average interest rate of another platform B in a jurisdiction Z; averaging all platforms loan default rate for an entire jurisdiction or region. Benchmarks/ Indices.

Equity Data Example. Include the feasibility and value of using social media in traditional company (public and private)-specific ratings models and investor-specific ratings.

Credit Risk Algorithm Example. Include creating an industry wide standard weighted credit risk model to underwrite loan and track performance.

Alert System Example. Include the ability to identify when a borrower exceeds borrower limits on one or more platforms. APPENDIX I

Market Data

Currently CrowdBureau collect, consolidate and unify data from over 85 separate peer- to-peer lending platforms from China (83), USA (2) and Europe (6) covering consumer loans, real estate, student loans, automobiles, agribusiness, renewable energy/solar, and lifestyles.

1. Stable automation of an API and or web-scraping technology for each platform.

2. Collection/capture of newly incoming Loans per hour by platform.

3. Collection/capture of any update event for any Loan per hour and platform.

4. For originated loans, the loan's performance status can be followed.

5. Distinction of Loans having:

a. Loan Progress smaller 100% - Loan is in 'ask' phase, no binding contract between two parties => indicates the 'ask' volume in the market.

b. Loan Progress equals 100% - Loan is active, binding legal contract between two parties => gives the loan / credit volume in the market

APPENDIX I

Derived Messages. We can derive messages for clients and products:

Average Loan return for loans having 'loan progress < 100% and =100%'

2. Estimated market volume for loans having 'loan progress < 100% and =100%'

3. Information on Repayment terms

4. Classification of loan usage We can produce derived data with classified attributes like: loan-usage, Loan- Repayment Term And more... Benchmarks

Benchmarks are currently provided quarterly print copy (mailed) for pilot phase. Digital daily, monthly benchmarks will launch Q4-2017.

Benchmark Characteristics.

1. Separate P2P Platform Accounts - Custody of all accounts (China regulatory requirement).

2. Loan Style Purity

a. Clustering by "Use of Loan" - Derive the claim, people are asking for X - currency on P2P platforms to by for example cars, real estate, etc. in an overall market manner.

3. Complete Transparency - All loans listed, interval events and originations

4. Total "Ask" Volume for all platforms a. P2P Loans have "an ask" phase before the loan is originated. For ay loan (per day) sum up all still open amounts, [nominal loan * (1-loan percentage)]

b. Sum up all platforms = "economically" we know the amount of money the market would want to lend via P2P loans.

5. Performance a. Interest rate, volume, value, defaults, charge-offs, etc.

6. Risk Controls APPENDIX I Daily, Monthly, Quarterly quantitative and qualitative review.

Credit RiskAlgorithm

Status: Estimating Default Probabilities Based On XYZ

Platform Data

Objective: Identify explanatory variables for predicting if a loan is likely to be repaid or not.

Data: XYZ Platform originated loan data, containing all loans issued between January 2010 and September 2016, with latest loan status as of publication date. Two subsets of loans - all having completed their life cycles, with loan status either "Fully Paid" or "Charged Off - have been analysed so far:

Subset 1 : Three and five year loans issued between January 2010 and November 2011 (30986 loans, 15% defaulted),

Subset 2: Three year loans issued between January 2010 and December 2013 (166267 loans, 12% defaulted).

Model: Logistic regression model on loan status as dependent variable. Different subsets of independent variables have been built from the following attributes:

APPENDIX I

AfmiNUlA i

Derived Messages. We can derive messages for clients and products:

1. Average Loan return for loans having 'loan progress < 100% and =100%'

2. Estimated market volume for loans having 'loan progress < 100% arid =100%'

3. Information on Repayment terms

4. C lassification of loan usage

5. We can produce derived data with classified attributes like: loan-usage, Loan-

Repayment Term

And more...

Benchmarks

Benchmarks are currently provided quarterly print copy (maiied) for pilot phase. Digital daily monthly benchmarks will launch Q4-2017.

APPENDIX I

Result: So far no attribute subset resulted in a model that calculates default probabilities matching the observed defaults in the originated loan data. None of the attributes seems to have much influence on loan status. To further analyse this issue, correlations between loan status and several attributes - such as "dti" (debt-to-income ratio) - have been calculated. For example, the correlation between dti and loan status in data subset 2 is only 0.09, which is very low.

Explanation: XYZ Platform has already used these attributes to differentiate between "good" and "bad" loans, where "good" loans are the ones they originated; they declined about 90% of all loan applications. So the data we analyse contain only the "Top 10%", exemplified by the debt-to-income ratio that remains below 35% in all 2010 to 2013 loans. In the declined loans data, for the same time interval, we find more than 200,000 dti values higher than 40% and up to 1000%. (Debt-to-income ratio is the only attribute in the declined loans data set that can be compared with originated loans.)

So it seems that other attributes would be needed to explain the defaults in originated loans. This could be, for example, indicators related to health or unemployment risks. APPENDIX I

Remark: The methodologies used in the Lending Club data analysis have been tested by using another set of credit data (from a bank). With these data, parameter estimations resulted in default probabilities that predicted the observed defaults and non-defaults quite well.

Appendix: Example

Estimators for a parameter subset, using a sample of 5000 loans (4300 fully paid, 700 charged off), using R:

Resulting default probabilities by quartiles and corresponding observed defaults, applied to complete dataset of 30,086 loans (26,636 fully paid / 4,350 charged off): APPENDIX I

APPENDIX I

For comparison: Default probabilities by quartiles and corresponding observed defaults from a bank's dataset of 300 loans (255 fully paid / 45 charged off):

APPENDIX I

Valuation Models Using Social Data for Crowdfunding

This section discusses the feasibility and value of using social media in traditional company (public and private)-specific ratings models and investor-specific ratings which indicates certain social media attributes have the potential to forecast crowdfunding success. For company-specific models, our results indicate the usage of social media data in the Solvency, Z, and Moat models; additionally, data indicates that social media may offer marginal

improvements to traditional models and has the potential to forecast outcomes such as bankruptcy when used alone.

General approach and workflow

When evaluating the value that social equity data can add to company ratings it is critical to first consider the existing methodologies and approaches to rating the financial health and growth of companies. Traditional models based on financial metrics have been used by many to forecast a variety of company outcomes ranging from bankruptcy to the possession of an economic moat. From an investment standpoint, ratings systems that can predict these outcomes with high accuracy could be especially valuable in choosing a portfolio to maximize one's investment returns.

Since the beginning of the Digital Age, we have been able to acquire an unprecedented amount of data relative to the eras preceding this period. Individuals are now more connected than ever and information and events can be communicated worldwide in a matter of seconds. APPENDIX I

Moreover, individuals are increasingly turning to social media in order to connect with others to develop relationships and to rapidly share news, data, and ideas. In turn, these relationships and ideas have the power to influence the decisions that individuals make.

With modem technology, it is possible to collect a wealth of data from the connections and exchanges between individuals on social media, but can these data be used to better measure a given company's future health and success? The general approach identified the Economic Moat, Fair Value Price, Altaian Z-score, Solvency score, and Earnings per Share valuation methodologies as a starting point for our analyses of social mediain company-specific ratings. For investor-specific ratings, we focused our attention on using social media to forecast the probability of crowdfunding success.

After identifying the models that would serve as a baseline for social media overlays, we next determined the mathematical basis of each model as well as the input and dependent variables that each model required. Once the variables needed for modeling were determined, we obtained historical financial data points for several companies for each model in order to forecast the outcome that each model aims to predict (e.g. the Solvency score model aims to forecast bankruptcy). With more time, we would perform more comprehensive analyses and restrict our acquisition of financial (and social) variables to the period ranging from 2007 to 2016 as the beginning of this period is one year after the foundation of twitter. In practice, our models used data from a narrower time window (2009 to 2015), and we relied on QuoteMedia and Gurufocus.com to acquire all of the financial data used in our modeling. Once obtained, we used the financial data along with the appropriate mathematical model to construct baseline models. We then acquired social media data using Crimson Hexagon, the Internet Archive APPENDIX I Crowdfunder.com, and secondary sources. Lastly, we combined the social media data with the traditional financial variables in different combinations to determine whether these data improved the forecasting abilities of traditional models, and we also evaluated whether social media data alone had any predictive power in terms of forecasting a company's financial health, earnings, or economic moat. The average accuracy of 100 iterations of each model that incorporated social media was compared to the average accuracy of baseline models and the accuracy that one would achieve if guessing at random to assess the predictive power of social media in ratings.

APPENDIX I

Quantitative Moat Social Data Forecast

Model overview. When considering investments, companies that have an economic moat are attractive as they tend to be less risky investments and offer more stable returns. We based our baseline Moat model on the 2013 Momingstar Methodology Paper written by Warren Miller2, and we implemented all of our modeling using the caret package within R3. The moats that were assigned to the companies in our analyses were obtained from the Momingstar website in January 2016 (see "CrowdBureau Social Data Moat Focus Benchmark" document for more information on the companies used in our analyses). We make the assumption that these companies held these moat designations as of the end of 2015.

We acquired the same 12 financial variables that Momingstar uses in its Quantitative Moat Ratings (described below) for the period ranging from 2013 to 2014 as this wouldprovide us with the data to forecast moat type at least one year before a company received its 2015 moat designation. Likewise, we applied two different random forests models in order to distinguish companies that have an economic moat from those that do not have a moat and to distinguish companies that have a narrow moat from those that have a wide moat. The predictions of our models are based on 500 regression trees (Details on the random forests model can be found in the Momingstar report 1 ).

APPENDIX I

For testing the accuracy of each model, we randomly split our data into a training dataset (60% of companies) and a test dataset (40% of companies) (FIG. 13). After training the model with data from 60% of companies, we use the model to classify the remaining 40% of companies and calculate the accuracy. Because the model's accuracy can vary due to randomly selecting training and testing data, we performed the above sequence of steps 100 times and took the average and standard deviations of the 100 trials as our final accuracy score. To generate a final moat score, we train each of our random forests model with all of the data in our matrix and then use the probability outputs of the random forests model to assess the probability of a company having both a moat and a wide moat. This approach is the same as that used by Momingstar and is calculated as follows:

2

In the above equation, the "(1 - Probability of No Moat)" and "(Probability ofWide Moat)" can be obtained directly from the caret package within R.

APPENDIX I

When overlaying social media variables, we use the same approach as described above. The main difference between the baseline models and social media overlay models is the matrix that we provide to the random forests models. In total, we constructed 23 different models (model descriptions provided separately) that consisted of the baseline model with different combinations of social media variables as well as a few models that consisted entirely of social media variables (described below). Due to time constraints, these models were not exhaustive in terms of the number of combinations that can be created, but they do serve as a substantial starting point for analyses.

The QuoteMedia API and Morningstar website were used to acquire the financial information needed for our analyses. Crimson Hexagon and company website were used to acquire all of the social media variables in our analyses.

Model variables (Financial and Social)

In total, we collected data on 17 different variables (12 financial and 5 social) forthe companies in our analyses. The financial variables, along with a description of how we acquired/calculated these variables, are:

Return on Assets (ROA) - We calculated ROAas: Net Income/Total Assets

These data were obtained using companies' annual report data from QuoteMedia.

Earnings Yield - We calculated Earnings Yield as:

Basic Earnings per Share/unadjusted closing stock price for a company on the day ofits report date APPENDIX I

Basic Earnings per Share data were obtained using companies' annual report data from QuoteMedia. Unadjusted closing stock price was also obtained from QuoteMedia.

Book Value Yield - We calculated Book Value Yield as: 1/Price to Book Ratio

These data were obtained using companies' annual report data from QuoteMedia.

Sales Yield - We calculated Sales Yield as:

Total Revenue/ (Total common shares outstanding x unadjusted closing stock price on the financial report date)

Basic Earnings per Share data were obtained using companies' annual report data from QuoteMedia. Unadjusted closing stock price was also obtained from QuoteMedia.

Equity Volatility - We calculated Equity Volatility as follows:

First, we gathered the unadjusted closing stock prices for a given company for the 365 days leading up to and including its report date. Next, we calculated the difference between the closing price for a given day and the day preceding it and then divided the difference by the closing price on the preceding day (i.e. (ClosingPricei+i - ClosingPricei)/ClosingPricei where i = 0 - 364). We did this for the 365 days up to the report date and took the standard deviation of these values. In summary, this can be described by the following equation:

Equity Volatility = StandardDeviation((ClosingPricei+i -ClosingPricei)/ClosingPricei)

Unadjusted closing stock prices were obtained from QuoteMedia.

Maximum Drawdown - We calculated Maximum Drawdown as follows:

First, we gathered the unadjusted closing stock prices for a given company for the 365 days leading up to and including its report date. We then subtracted the highest closing price APPENDIX I

from the lowest closing price and divided the difference by the highest closing price. In summation,

Maximum Drawdown = (Minimum Closing Price - Maximum Closing Price)/Maximum

Closing Price

Unadjusted closing stock prices were obtained fromQuoteMedia.

Average Daily Volume - We calculated Average Daily Volume as the average of the unadjusted share volume for each day for the 365 days up to and including the annual report date). Unadjusted share volumes were obtained fromQuoteMedia.

Total Revenue - Total Revenue for each company was obtained directly from the QuoteMedia API output based on each company's annual report data.

Market Capitalization - We calculated Market Capitalization as: Total common shares outstanding x unadjusted closing price.

QuoteMedia was used to obtain these values on date that each company filed its annual report.

Enterprise Value - We calculated Enterprise Value as:

Market Capitalization + Preferred Stock + Longterm Debt + Current Debt + Minority Interest - Cash and Equivalents

These values were obtained from companies' annual reports using QuoteMedia.

Enterprise Value/Market Capitalization - We calculated this value by dividing the Enterprise Value calculated above by the Market Capitalization calculated above.

Sector ID - We obtained the Sector ID directly from the QuoteMedia API.

The social media variables, along with a description of how we acquired/calculated these variables, are: APPENDIX I

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Due to time constraints, we used the number of links on a company's website as of February 2016 under the assumption that the companies have not added a significant number of social media links to their website since 2013. Ideally, we would use the Internet Archive to fetch historical scores. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for the links was not exhaustive. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Identity block.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("Moat; 2014 Data") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from January 1, 2013 to December 31, 2014. For a given company, data were collected from the twelve months leading up to the company's annual report date. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Conversation block.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. Data were obtained from the "Moat; 2014 Data" Buzz Monitor on Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belongingto the Conversation block.

Posts per Author - We calculated this as the total number of posts for the twelve-month period preceding the company's annual report date divided by the total number of Twitterauthors that posted during that time. Data were obtained from the "Moat; 2014 Data" Buzz Monitor on APPENDIX I

Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. Note: If a company had 0 authors during the time range, then we manually set this value to 0 to avoid dividing by 0

Impressions per Post - We calculated this as:

Total Potential Impressions (see above description)/Total Posts (see above description) Data were obtained from the "Moat; 2014 Data" Buzz Monitor on Crimson Hexagon.

Under the 7 Building Blocks of Social Media, this could be classified as belonging to the

Conversation block. Note: If a company had 0 posts during the time range, then we manually set this value to 0 to avoid dividing by 0.

Company inclusion criteria

This section is meant to give an overview of our selection process for the companies included in our moat analyses. Specific information on the companies themselves can be obtained from the "CrowdBureau Social Data Moat Focus Benchmark" Word document and the

"Moat_2014Data_Master_Matrix" Excel document. In January of 2016, we used the

Momingstar website to obtain a list of approximately 120 companies that were identified by

Momingstar as having either a wide moat, a narrow moat, or no moat. We then used the

QuoteMedia API to either directly acquire the 12 financial variables described above (e.g. Total

Revenue) or to acquire the variables needed to calculate these attributes (e.g. We acquired total common shares outstanding and unadjusted close and then calculated Market Capitalization from these values). To remain in our analysis, a company must have reported all variables necessary to obtain the 12 financial input attributes for the moat model for their 2014 annual report.

Companies for which we could not acquire all 12 attributes from their 2014 annual report were expunged from our analyses. Following filtering, a total of 59 companies were used in our final APPENDIX I analysis. Of these companies, 23 were listed as having wide moats, 19 were listed as having narrow moats, and 17 were listed as having no moat.

Data acquisition

To acquire the 12 financial variables (or the components that make up these variables) we developed several in-house Python scripts to download these data using the QuoteMedia API. These scripts are summarized below. These scripts were uploaded to Git within the

"Model_Code_2_24_16" zipped folder. Within this folder, they can be found in the subdirectory entitled "Moat_Model". All scripts are set up to obtain data from the 10 most recent annual reports for a given company, but a simple modification of the API call will allow one to acquire more reports if needed. Prior to running these scripts, one must have Python installed on his or her computer. In theory, Python2 or Python3 should work, but the data were acquired on a machine running Python 2.7. Moreover, these codes rely on the import of several python modules. To view the modules needed for each code, open the scripts with a text editor and look at the first few lines of code.

The scripts and their purpose are as follows:

getHistoricalROA.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Net Income, Total Assets, and the Report Date for which Net Income and Total Assets were acquired in tab delimited format. Retum on Assets can be calculated in Excel using Net Income and Total Assets as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be APPENDIX I aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalROA.py > HistoricalROA.txt (Note: "HistoricalROA.txt" can be changed to whatever filename is desired)

getHistoricalEarningsYield.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Earnings per Share, Unadjusted Closing Price, and the Report Date in tab delimited format. Earnings Yield can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An example list would be:

['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalEarningsYield.py > HistoricalEY.txt (Note:

"HistoricalEY.txt" can be changed to whatever filename is desired)

getHistoricalBookValue Yield. py - This script takes a list of company ticker symbols and returns Ticker Symbol, the Price to Book Ratio, and the Report Date in tab delimited format.

Book Value Yield can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 28 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel. APPENDIX I

Usage: python getHistoricalBookValueYield.py > HistoricalBVY.txt (Note:

"HistoricalBVY.txt" can be changed to whatever filename is desired)

getHistoricalSalesYield.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Total Revenue, Total Common Shares Outstanding, Unadjusted Closing Price, and the Report Date in tab delimited format. Sales Yield can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code retums historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalSalesYield.py > HistoricalSY.txt (Note: "HistoricalSY.txt" can be changed to whatever filename is desired) getHistoricalVolatility MaximumDrawdown AverageVolume.py - This script takes a list of company ticker symbols and retums Ticker Symbol, Equity Volatility, Maximum

Drawdown, Average Volume, and the Report Date in tab delimited format. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 40 of this script. An example list would be:

['AMGN','BIIB ','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years

(including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel. Further, this code will exit if it encounters an error while APPENDIX I

calculating Equity Volatility. An overwhelming majority of companies do not result in this code encountering an error, but there are companies that occasionally cause the error to arise.

Although this bug still needs more troubleshooting, we suspect that the error is due to missing data. The easiest fix is to find the company that is causing the error and delete it from the list of companies pasted in on Line 40. Due to time constraints, we were unable to provide a patch for this bug. Currently, the code is set up to first identify companies that give an error while the code is running. We first suggest using this code by running "python

getHistoricalVolatility MaximumDrawdown AverageVolume.py". This will print the companies and their data to the terminal. If the code exits, then one can see which company the code was working on prior to exiting and delete whichever company is giving the error. Once expunging all companies that give an error, add the "#" in front of the code on line 117, and then remove the "#" in front of the code on line 123. Then one can use the code as follows:

python getHistoricalVolatility MaximumDrawdown AverageVolume.py >

HistoricalV_MD_AV.txt (Note: "HistoricalV_MD_AV.txt" can be changed to whatever filename is desired)

getHistoricalTotalRevenue.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Total Revenue, and the Report Date in tab delimited format. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An example list would be:

['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years

(including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel. APPENDIX I

Usage: python getHistoricalTotalRevenue.py > HistoricalTR.txt(Note: "HistoricalTR.txt" can be changed to whatever filename is desired)

Note: This is technically redundant with the script for Historical Sales Yield.

getHistoricalMarketCap.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Total Common Shares Outstanding, Unadjusted Closing Price, and the Report Date in tab delimited format. Market Capitalization can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalMarketCap.py > HistoricalMC.txt (Note: "HistoricalMC.txt" can be changed to whatever filename is desired)

Note: This script has redundancy with the script for obtaining Historical Sales Yield.

getHistoricalEnterpriseValue.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Total Common Shares Outstanding, Unadjusted Closing Price,

Current Debt, Long-term Debt, Cash and Equivalents, Preferred Stock, Minority Interest, andthe

Report Date in tab delimited format. Enterprise Value can be calculated using these variables in

Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 28 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns APPENDIX I historical

APPENDIX I

data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalEnterpriseValue.py > HistoricalEV.txt(Note:

"HistoricalEV.txt" can be changed to whatever filename is desired)

getSector.py - This script takes a list of company ticker symbols and returns Ticker Symbol and Sector ID in tab delimited format. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 29 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly.

Usage: python getSector.py > HistoricalSector.txt (Note: "HistoricalSector.txt" can be changed to whatever filename is desired)

Although touched upon in the "Model Variables (Financial and Social)" section, we took the following approach to acquire the social media variables in our analyses.

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include

Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Due to time constraints, we used the number of links on a company's website as of February 2016 under the assumption that the companies have not added a significant number of social media links to their website since 2013. Ideally, we would use the Internet Archive to fetch historical scores. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for the links was not exhaustive.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("Moat; APPENDIX I

2014 Data") that searched for the use of companies' cashtags on Twitter, Facebook, andTumblr from January 1, 2013 to December 31, 2014. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time- specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2014, then we acquired Total Posts from December 31 st , 2013 to December 31 st 2014. The number of Total Posts was acquired from the monitor screen online.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. We created a Buzz Monitor on Crimson Hexagon ("Moat; 2014 Data") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from January 1, 2013 to December 31, 2014. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2014, then we acquired Total Potential Impressions from December 31 st , 2013 to December 31 st 2014. We downloaded an excel file from Crimson Hexagon that contained data on Total Potential Impressions as the website interface rounded this number. Within the Excel file, we summed the number of potential impressions each day in order to arrive at Total Potential Impressions for the time period. APPENDIX I

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitterauthors that posted during that time. We created a Buzz Monitor on Crimson Hexagon ("Moat; 2014 Data") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblrfrom January 1, 2013 to December 31, 2014. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time- specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2014, then we acquired Total Potential Impressions from December 31 st , 2013 to December 31 st 2014. We downloaded an excel file from Crimson Hexagon that contained data on Total Number of Twitter Authors and Average Posts per Author on a given day. Within the Excel file, we first multiplied the number of Twitter Authors posting on a given day by the average posts per author for that day in order to get the number of posts for each day. We then summed the total number of posts across the time period and divided this by the sum of Twitter Authors for the time period in order to arrive at Posts per Author.

Impressions per Post - We calculated this in Excel by dividing Total Potential

Impressions by Total Posts (after acquiring them as described above).

Model testing and results.

Once we acquired all of the financial and social media data described above for the 59 companies in our analysis, we generated the "Moat_2014Data_Master_Matrix" Excel spreadsheet, which can be found on Confluence. This spreadsheet was too large to include in the APPENDIX I report, but it contains all of the data along with other details (e.g. cashtags, report dates, social data date ranges, etc) that are useful for acquiring further information about the companies used in the modeling process. After generating this data matrix, we next created the baseline data matrices for the "No Moat vs. Moat" random forests model (Table 1) and the "Narrow Moat vs. Wide Moat" random forests model (Table 2).

Table 1. Snapshot of example matrix with variables that are input into the "No Moat vs. Moat" model. Input variables are abbreviated for brevity. The variable we are trying to predict ("Moat") is highlighted in green. Although the company names are not shown, each row corresponds to a specific company. ROA= Return on Assets; EY = Earnings Yield; SY = Sales Yield; BVY = Book Value Yield; EqVol = Equity Volatility; MD = Maximum Drawdown;AV = Average Volume; TR = Total Revenue; MC = Market Capitalization; EV = Enterprise Value; EV_MC = Enterprise Value/ Market Capitalization. TR, MC, and EV measured in United States Dollars.

Table 2. Snapshot of example matrix with variables that are input into the "Narrow Moat vs. Wide Moat" model. Input variables are abbreviated for brevity. The variable we are trying to predict ("Moat") is highlighted in green. Although the company names are not shown, each row corresponds to a specific company. ROA = Return on Assets; EY = Earnings Yield; SY APPENDIX I

= Sales Yield; BVY = Book Value Yield; EqVol = Equity Volatility; MD = Maximum

APPENDIX I

Drawdown; AV = Average Volume; TR = Total Revenue; MC = Market Capitalization; EV = Enterprise Value; EV_MC = Enterprise Value/ Market Capitalization. TR, MC, and EV measured in United States Dollars.

After setting up the baseline matrices, we proceeded to run a random forests model on each baseline matrix to calculate the average accuracy of each of our baseline models. To do this, we developed an R script entitled "Script for Running Models.r". Although we aim to describe this script in detail separately, we will provide a brief overview of how this script determines the mean accuracy and standard deviation of model accuracy. This script was uploaded to Git in the "Model_Code_2_24_16" zipped folder and is contained within the "Modeling_Script" subdirectory of this archive.

The first step of this script involves importing the baseline data matrix (see Table 1 and Table 2 for examples). After loading the matrix, the code randomly selects 60% of the data for training and 40% of the data for testing. As an example, if we loaded Table 1 (this table has 10 lines of data) into the code, then 6 lines of data would be randomly selected to train a random forests model and 4 lines of data would be randomly selected for testing purposes. After training, the code predicts which category each data point in the test data fall under and then compares the predictions to the actual category of each data point. The accuracy of the model is then stored in a APPENDIX I list, and the steps described above are repeated 99 more times for a total of 100 iterations.

APPENDIX I

After 100 iterations, the code prints out the average accuracy and the standard deviation of the accuracy. We plotted and compared the average accuracy along with standard error of the mean (standard deviation/square root of sample size) of the models that we tested.

After implementing the modeling code described above, we found that our baseline model for the "No Moat vs. Moat" model was 83.6% accurate on average with a standard deviation of 6.7%, and our baseline model for the "Narrow Moat vs. Wide Moat" model was 71.9% accurate on average with a standard deviation of 11.5%. Note that these were the accuracies at the time we implemented the models. If run again, one is likely to get highly similar but not exact results due to the random selection of the training and testing data sets. The No Information Rate (NIR; Random Forecasting) for the "No Moat vs. Moat" model was 71.2%, and the NIR for the "Narrow Moat vs. Wide Moat" model was 54.8%. These rates are what one would obtain if he or she were guessing the nature of a given company's moat at random.

After generating our baseline models, we added social media variables to our baseline matrices in different combinations (detailed in the

"Narrow_v_Wide_Moat_Model_Descriptions" and "No_Moat_v_Moat_Model_Descriptions" documents). We generated a total of 23 different overlay models that either included baseline data plus various combinations of social media data or social media data alone. The number of models tested was far from exhaustive as other combinations of social media and baseline variables exist that we did not test. We also did not explore combining subsets of baseline variables with social media variables. Thus, our conclusions below are based on a limited subset of combinations derived from the universe of combinations of baseline variables (viewed as "one" variable) and social media variables. APPENDIX I

Using the code described above, we calculated the average accuracies and standard deviations of accuracies for each model and compared them to our baseline matrices. Of our 23 models, Model 8 (M8) appears to have a marginal accuracy increase (85.5% accuracy) when forecasting whether companies will have either "no moat" or a "moat" relative to baseline (83.6% accuracy). Model 8 includes baseline data plus total potential impressions and identity score. Several models constructed using social media alone do not appear able to distinguish companies with "no moat" from companies that have a "moat". None of the models we overlaid with social media forecast narrow and wide moats better than the baseline model. However, several models constructed using social media alone appear to forecast "narrow" vs. "wide" moat better than random. Given these results, along with the fact that we only tested a fraction of the different possible combinations in our analyses, theseresults

suggest that further examination of the Quantitative Moat Social Data Forecast by CrowdBureau are warranted.

APPENDIX I

Although the baseline and social media overlay models showed promise in forecasting moat type, they were still separated at this point in our analysis. To combine the predictions of each model into a single rating, we trained our "No Moat vs. Moat" random forests model and "Narrow Moat vs. Wide Moat" random forests model using all of the data, and then used this model to predict each company's probability of having no Moat and probability of having a Wide Moat. We then used the approach developed by Momingstar to calculate moat scores as follows:

APPENDIX I

The R modeling script that we used for the accuracy analyses can be modified to yield the probabilities for each company in the above equation. We first generated moat scores for each of the companies in our analysis based solely on the baseline data, and these scores can be found in the "CrowdBureau Social Data Moat Focus Benchmark" document. Because we observed that Model 8 marginally improved forecasting accuracy for distinguishing companies that have an economic moat from companies that do not have an economic moat, we generated probabilities of having no moat for each of the companies using Model 8. We then combined these probabilities with the wide moat probability generated by the baseline "Narrow Moat vs. Wide

APPENDIX I

Moat" model to obtain moat scores for each company that take into account their social media Identity score and total potential impressions. These scores can be viewed for each company in the "CrowdBureau Social Data Moat Focus Benchmark" document.

Lastly, we asked how well our baseline quantitative moat score could segregate Wide, Narrow, and No Moat companies. Using Excel, we calculated the percentile rank of each company based on its moat score. With this approach, the top 23 ranked companies shouldhave wide moats, the 17 lowest ranked companies should have no moats, and the middle 19 ranked companies should have narrow moats. As shown in Table 3, our baseline model performed well in terms of segregating out the different moat types. More data and further analyses are needed in order to determine whether social overlay models can improve the forecasting abilities of the traditional model using the moat score approach.

Table 3. Evaluation of moat forecasting power of baseline moat scores

APPENDIX I

Quantitative Fair Value Price Social Data Forecast Model Overview.

Knowing the current and future cash flows of a company is essential for investors looking to maximize return on their investments. One approach to estimating the future value of an investment involves calculating the fair value price of a stock. When considering various stocks, it is advantageous to invest in stocks that are undervalued. That is, if a stock's current price is lower than its fair value estimate, then it would be a good candidate for inclusion in one's investment portfolio. Due to time and resource constraints, we were unable to fully implement the Fair Value Social Data Forecast model. However, we provide a synopsis of our work to date and note steps that need to be taken in order to move forward with this model from its current stage.

We based our Fair Value model methodology on the 2013 Morningstar Methodology Paper written by Warren Miller 4 , and we would have implemented all of our modeling using the caret package within R 5 if time had permitted us to finish the construction of this model. The companies that we planned to use in our analyses were the same as those used in our Quantitative Moat Social Data Forecast. We acquired the same 12 financial variables that Morningstar uses in its 2013 Methodology Paper 3 (described below) for the period ranging from 2013 to 2014. These are also the same inputs for our Quantitative Moat Social Data Forecast methodology. We acquired social media data (described below) during this time as well. In preparation for forecasting more recent fair value prices, we also collected social media data from 2014 to 2015 and constructed code to acquire the 12 financial input variables most recently available from QuoteMedia. APPENDIX I

We also constructed a theoretical Discounted Cash Flow (DCF) model, which would allow us to calculate a company's fair value price. Unfortunately, we were not able to acquire all of the variables (both historical and current) in order to implement this model during the allotted time for the engagement. Similar to our Quantitative Moat Social Data Forecast, we would have applied a random forests model using 500 regression trees (Details on the random forests model can be found in the Momingstar report 3 ) to predict the fair value price of the companies in our analyses. Specifically, we aimed to use the 12 financial variables to predict Fair Value Price (FVP), which we would have calculated as:

FVP = log(0.0001 + DCF-based Fair Value Estimate/ Current Closing Price)

After acquiring the 12 financial variables and fair value estimates based on our DCF model, we would have constructed a matrix similar to that one displayed in Table 4.

Table 4. Example baseline data matrix that would have been used in the Fair Value Price Social Data Forecast Model. In the table, "x" is calculated based on dividing the fair value estimate of a company's stock (would have been obtained with DCF model) by the closing price of a company on their annual report date. The variable we are trying to predict ("FVP") is highlighted in green. FVP = Fair Value Price; ROA = Return on Assets; EY = Earnings Yield; SY = Sales Yield; BVY = Book Value Yield; EqVol = Equity Volatility; MD = Maximum Drawdown; AV = Average Volume; TR = Total Revenue; MC = Market Capitalization; EV = Enterprise Value; EV_MC = Enterprise Value/ Market Capitalization. TR, MC, and EV measured in United States Dollars. APPENDIX I

For testing the accuracy of each model, we would have randomly split our data into a training dataset (60% of companies) and a test dataset (40% of companies) (FIG. 14). After training the model with data from 60% of companies, we would have used the model to calculate Fair Value Price. After calculating this price, we would have taken the absolute difference between the model's estimate of the Fair Value Price and the actual Fair Value Price generated by our DCF model. Because the model's accuracy can vary due to randomly selecting training and testing data, we would have performed the above sequence of steps 100 times and took the average and standard deviations of the differences across 100 trials. These values would have allowed us to evaluate whether the social overlay models predicted values closer to theDCF- generated values relative to a baseline model that only incorporates financial information. For a final rating, we would have reported the Fair Value Price given by our DCF model and the Fair Value Price generated by our random forests model.

APPENDIX I

The QuoteMedia API and Morningstar website were used to acquire the financial input variables and company names in our analyses respectively. We also used the QuoteMedia API to acquire several of the variables needed to calculate the output of our DCF model, and we aimed to use the QuoteMedia API to acquire the rest of the variables needed to implement the DCF model. Crimson Hexagon and company websites were used to acquire all of the social media variables in our analyses.

APPENDIX I

DCF Model Description

A discounted cash flow (DCF) is a valuation method used to estimate the fair value of an investment or a company in our case. DCF analysis projects future free cash flow and discounts them to arrive at a present value estimate.

We developed a two stage DCF. We assumed that in the 5 years from now on, the company's cash flows will grow at the same growth rate of its earnings per share in the past 3 years and after that, the company's cash flows become a perpetuity with a growth rate of 3%, which is roughly the long term growth rate of the U.S. economy.

The cash flows

The present free cash flows of a company, FCF^n our model, is calculated by taking operating cash flow of the present period and subtracting capital expenditures.

FCF=cash from operating activities- CapEx=Cash from operating activities- Purchase of PPE - Purchase of intangibles

G= growth rate of basic EPS

We would run a linear regression of the log of the past 3 years' (3 years prior to the date we are interested in predicting fair value for) EPS on time, and G is the coefficient. With this growth rate we would obtain the cash flows for the next 5 years. After five years, the cash flows were assumed to grow at a rate of 3% per year perpetually.

The discount rate

Here, we would use the WACC (weighted average cost of capital) as the discount rate, which is the average of the cost of debt and cost of equity weighted by the proportion of debt and equity. Roughly, the cost of debt is calculated by dividing interest expenses by the average of a APPENDIX I given year and the prior year's total debt where:

APPENDIX I

Total debt=current debt+ long-term debt+ commercial paper

The cost of equity is the expected return of the company's stock calculated by the CAPM model. Here, we would use 2% as the risk free rate and 7.5% as market excess return. Ce=2%+beta*7.5%

Discount rate =

Discounting the cash flows

We would first calculate the perpetuity value and add this perpetuity value to the value of the fifth year. Then, we would discount all five years to the time of interest.

Perpetuity value= is the long term growth rate estimated, here we use 3% Intrinsic

Value =Discount Value + Perpetuity Value

=

G= growth rate of basic EPS Current development stage

Currently, we have developed the code to acquire most of the input variables needed for the DCF model from QuoteMedia. In order to complete the DCF model, we need to fully APPENDIX I develop code to acquire the variables necessary for the growth rate of basic EPS and usethese variables to calculate the growth rate G.

Future directions

To complete the DCF model and implement the Fair Value Social Data Forecast, CrowdBureau would need to calculate the EPS growth rate (G) for each of the companies used in these models. Next, CrowdBureau would need to construct the baseline Fair Value Price matrix (output of DCF plus 12 financial input variables). Finally, CrowdBureau would overlay social media variables onto the baseline matrix to determine whether the addition of social equity data improves the accuracy of the baseline model by reducing the absolute difference between the random forests model's Fair Value Price prediction and the DCF model's Fair Value Price prediction.

Fair Value Model input variables (Financial and Social)

In total, we collected data on 17 different input variables (12 financial and 5 social)for the companies in our analyses. These are the exact same input values as those used in our Quantitative Moat Social Data Forecast model. The financial variables, along with a description of how we acquired/calculated these variables, are:

Return on Assets (ROA) - We calculated ROAas: Net Income/Total Assets

These data were obtained using companies' annual report data from QuoteMedia.

Earnings Yield - We calculated Earnings Yield as:

Basic Earnings per Share/unadjusted closing stock price for a company on the day ofits report date APPENDIX I

Basic Earnings per Share data were obtained using companies' annual report data from QuoteMedia. Unadjusted closing stock price was also obtained from QuoteMedia.

Book Value Yield - We calculated Book Value Yield as: 1/Price to Book Ratio

These data were obtained using companies' annual report data from QuoteMedia.

Sales Yield - We calculated Sales Yield as:

Total Revenue/ (Total common shares outstanding x unadjusted closing stock price on the financial report date)

Basic Earnings per Share data were obtained using companies' annual report data from QuoteMedia. Unadjusted closing stock price was also obtained from QuoteMedia.

Equity Volatility - We calculated Equity Volatility as follows:

First, we gathered the unadjusted closing stock prices for a given company for the 365 days leading up to and including its report date. Next, we calculated the difference between the closing price for a given day and the day preceding it and then divided the difference by the closing price on the preceding day (i.e. (ClosingPricei+i - ClosingPricei)/ClosingPricei where i = 0 - 364). We did this for the 365 days up to the report date and took the standard deviation of these values. In summary, this can be described by the following equation:

Equity Volatility = StandardDeviation((ClosingPricei+i -ClosingPricei)/ClosingPricei)

where i = 0 - 364

Unadjusted closing stock prices were obtained from QuoteMedia.

Maximum Drawdown - We calculated Maximum Drawdown as follows:

First, we gathered the unadjusted closing stock prices for a given company for the 365 days leading up to and including its report date. We then subtracted the highest closing price APPENDIX I

from the lowest closing price and divided the difference by the highest closing price. In summation,

Maximum Drawdown = (Minimum Closing Price - Maximum Closing Price)/Maximum

Closing Price

Unadjusted closing stock prices were obtained fromQuoteMedia.

Average Daily Volume - We calculated Average Daily Volume as the average of the unadjusted share volume for each day for the 365 days up to and including the annual report date). Unadjusted share volumes were obtained fromQuoteMedia.

Total Revenue - Total Revenue for each company was obtained directly from the QuoteMedia API output based on each company's annual report data.

Market Capitalization - We calculated Market Capitalization as: Total common shares outstanding x unadjusted closing price.

QuoteMedia was used to obtain these values on date that each company filed its annual report.

Enterprise Value - We calculated Enterprise Value as:

Market Capitalization + Preferred Stock + Longterm Debt + Current Debt + Minority Interest - Cash and Equivalents

These values were obtained from companies' annual reports using QuoteMedia.

Enterprise Value/Market Capitalization - We calculated this value by dividing the Enterprise Value calculated above by the Market Capitalization calculated above.

Sector ID - We obtained the Sector ID directly from the QuoteMedia API.

The social media variables, along with a description of how we acquired/calculated these variables, are: APPENDIX I

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Due to time constraints, we used the number of links on a company's website as of February 2016 under the assumption that the companies have not added a significant number of social media links to their website since 2013. Ideally, we would use the Internet Archive to fetch historical scores. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for the links was not exhaustive. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Identity block.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("Moat; 2014 Data") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from January 1, 2013 to December 31, 2014. For a given company, data were collected from the twelve months leading up to the company's annual report date. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Conversation block.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. Data were obtained from the "Moat; 2014 Data" Buzz Monitor on Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belongingto the Conversation block.

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitterauthors that posted during that time. Data were obtained from the "Moat; 2014 Data" Buzz Monitor on APPENDIX I

Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. Note: If a company had 0 authors during the time range, then we manually set this value to 0 to avoid dividing by 0.

Impressions per Post - We calculated this as:

Total Potential Impressions (see above description)/Total Posts (see above description) Data were obtained from the "Moat; 2014 Data" Buzz Monitor on Crimson Hexagon.

Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. Note: If a company had 0 posts during the time range, then we manually set this value to 0 to avoid dividing by 0.

APPENDIX I

Company inclusion criteria.

This section is meant to give an overview of our selection process for the companies included in our Fair Value Price analyses. Specific information on the companies themselves can be obtained from the "CrowdBureau Social Data Fair Value Focus Benchmark" document. In January of 2016, we used the Morningstar website to obtain a list of approximately 120 companies that were identified by Morningstar as having either a wide, narrow, or no moat. We used QuoteMedia to either directly acquire the 12 financial input variables (e.g. Total Revenue) or the input variables needed to calculate these attributes (e.g. We acquired total common shares outstanding and unadjusted close and then calculated Market Capitalization from these values). To remain in our analysis, a company must have reported all variables necessary to obtain the 12 financial input attributes for their 2014 annual report. Companies for which we could not acquire all 12 attributes from their 2014 annual report were expunged from our analyses.

Following this filtering step, 59 companies remained. We would have further filtered companies based on whether we could acquire all variables needed for the DCF analysis. Companies lacking any variables would have been expunged from our analyses. Thus, it is possible that the list of 59 companies will get smaller.

Data acquisition.

To acquire the 12 financial input variables (or the components that make up these variables) we developed several in-house Python scripts to download these data using the

QuoteMedia API. These scripts are summarized below. These scripts were uploaded to Git within the "Model_Code_2_24_16" zipped folder. Within this folder, they can be found in the subdirectory entitled "Moat_Model". All scripts are set up to obtain data from the 10 most recent annual reports for a given company, but a simple modification of the API call will allow APPENDIX I one to acquire more reports if needed. We also developed python scripts to acquire several of the variables needed for the DCF model, although this script needs more development to be modified in order to capture historical information if usage of historical data (e.g. data from2013 to 2014) is desired. This code is entitled "getFairValueVars.py" and was uploaded to Git. Prior to running these scripts, one must have Python installed on his or her computer. In theory, Python2 or Python3 should work, but the data were acquired on a machine running Python 2.7. Moreover, these codes rely on the import of several python modules. To view the modules needed for each code, open the scripts with a text editor and look at the first few lines of code. The scripts and their purpose are as follows:

getFairValueVars.py - This script takes a list of company ticker symbols and returns several variables in tab delimited format. Although the first several variables returned by the code are theoretically the most recent financial input variables (i.e. "Return on Assets", "Earnings Yield", "Sales Yield", "Book Value Yield", "Total Revenue", "Market Cap", "Enterprise Value", "Average Daily Volume", "Equity Volatility", and "Maximum Drawdown"), it may be more advisable that the other codes listed below be used to obtain the 12 financial input variables as this code has not been thoroughly reviewed due to being incomplete. This also applies to the other variables returned by the code ("Free Cash Flow", "Total Debt", "TaxRate", "Cost of Equity", "Cost of Debt"). The code for these variables also needs to be modified in order to capture historical data for these variables. The variables that this code returns are:

"Ticker Symbol", "Sector lD" (Note: although this says "Sector ID" this returns the type of "template" that QuoteMedia uses for its company type and NOT the actual sector. This is something that needs to be corrected in the code.), "Return on Assets, "Earnings Yield", "Sales

Yield", "Book Value Yield", "Total Revenues", "Market Capitalization", "Enterprise Value", APPENDIX I

"Average Daily Volume", "Equity Volatility", "Maximum Drawdown", "Free Cash Flow", "Total Debt", "Tax Rate", "Cost of Equity", and "Cost of Debt".

To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 43 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code does not returns historical data.

Usage: python getFairValueVars.py (Note: 'QuoteMedia_FairValue_variables.tsv' on Line 64 can be changed to whatever filename is desired)

getHistoricalROA.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Net Income, Total Assets, and the Report Date for which Net Income and Total Assets were acquired in tab delimited format. Return on Assets can be calculated in Excel using Net Income and Total Assets as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalROA.py > HistoricalROA.txt (Note: "HistoricalROAtxt" can be changed to whatever filename is desired)

getHistoricalEarningsYield.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Earnings per Share, Unadjusted Closing Price, and the Report Date in tab delimited format. Earnings Yield can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of APPENDIX I

companies separated by a comma on Line 27 of this script. An example list would be:

['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years

(including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalEarningsYield.py > HistoricalEY.txt (Note:

"HistoricalEY.txt" can be changed to whatever filename is desired)

getHistoricalBookValue Yield. py - This script takes a list of company ticker symbols and returns Ticker Symbol, the Price to Book Ratio, and the Report Date in tab delimited format. Book Value Yield can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 28 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalBookValue Yield.py > HistoricalBVY.txt (Note:

"HistoricalBVY.txt" can be changed to whatever filename is desired)

getHistoricalSales Yield.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Total Revenue, Total Common Shares Outstanding, Unadjusted Closing

Price, and the Report Date in tab delimited format. Sales Yield can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are APPENDIX I all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as MicrosoftExcel.

Usage: python getHistoricalSalesYield.py > HistoricalSY.txt (Note: "HistoricalSY.txt" can be changed to whatever filename is desired) getHistoricalVolatility MaximumDrawdown AverageVolume.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Equity Volatility, Maximum

Drawdown, Average Volume, and the Report Date in tab delimited format. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 40 of this script. An example list would be:

['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel. Further, this code will exit if it encounters an error while calculating Equity Volatility. An overwhelming majority of companies do not result in this code encountering an error, but there are companies that occasionally cause the error to arise.

Although this bug still needs more troubleshooting, we suspect that the error is due to missing data. The easiest fix is to find the company that is causing the error and delete it from the list of companies pasted in on Line 40. Due to time constraints, we were unable to provide a patch for this bug. Currently, the code is set up to first identify companies that give an error while the code is running. We first suggest using this code by running "python

getHistoricalVolatility MaximumDrawdown AverageVolume.py". This will print the APPENDIX I

companies and their data to the terminal. If the code exits, then one can see which company the code was working on prior to exiting and delete whichever company is giving the error. Once expunging all companies that give an error, add the "#" in front of the code on line 117, and then remove the "#" in front of the code on line 123. Then one can use the code as follows:

python getHistoricalVolatility MaximumDrawdown AverageVolume.py >

HistoricalV_MD_AV.txt (Note: "HistoricalV_MD_AV.txt" can be changed to whatever filename is desired)

getHistoricalTotalRevenue.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Total Revenue, and the Report Date in tab delimited format. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An example list would be:

['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years

(including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalTotalRevenue.py > HistoricalTR.txt(Note: "HistoricalTR.txt" can be changed to whatever filename is desired)

Note: This is technically redundant with the script for Historical Sales Yield.

getHistoricalMarketCap.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Total Common Shares Outstanding, Unadjusted Closing Price, and the

Report Date in tab delimited format. Market Capitalization can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this script. An APPENDIX I example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code retums historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalMarketCap.py > HistoricalMC.txt (Note: "HistoricalMC.txt" can be changed to whatever filename is desired)

Note: This script has redundancy with the script for obtaining Historical Sales Yield.

getHistoricalEnterpriseValue.py - This script takes a list of company ticker symbols and returns Ticker Symbol, Total Common Shares Outstanding, Unadjusted Closing Price, Current Debt, Longterm Debt, Cash and Equivalents, Preferred Stock, Minority Interest, andthe Report Date in tab delimited format. Enterprise Value can be calculated using these variables in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 28 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code retums historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalEnterpriseValue.py > HistoricalEV.txt(Note:

"HistoricalEV.txt" can be changed to whatever filename is desired)

getSector.py - This script takes a list of company ticker symbols and retums Ticker Symbol and Sector ID in tab delimited format. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line APPENDIX I

29 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly.

Usage: python getSector.py > HistoricalSector.txt (Note: "HistoricalSector.txt" can be changed to whatever filename is desired)

Although touched upon in the "Model Variables (Financial and Social)" section, we took the following approach to acquire the social media variables in our analyses.

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Due to time constraints, we used the number of links on a company's website as of February 2016 under the assumption that the companies have not added a significant number of social media links to their website since 2013. Ideally, we would use the Internet Archive to fetch historical scores. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for the links was not exhaustive.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("Moat; 2014 Data") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from January 1, 2013 to December 31, 2014. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time- specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December APPENDIX I

31 st , 2014, then we acquired Total Posts from December 31 st , 2013 to December 31 st 2014. The number of Total Posts was acquired from the monitor screen online.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. We created a Buzz Monitor on Crimson Hexagon ("Moat; 2014 Data") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from January 1, 2013 to December 31, 2014. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2014, then we acquired Total Potential Impressions from December 31 st , 2013 to December 31 st 2014. We downloaded an excel file from Crimson Hexagon that contained data on Total Potential Impressions as the website interface rounded this number. Within the Excel file, we summed the number of potential impressions each day in order to arrive at Total Potential Impressions for the time period.

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitterauthors that posted during that time. We created a Buzz Monitor on Crimson Hexagon ("Moat; 2014 Data") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblrfrom January 1, 2013 to December 31, 2014. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time- specific data, we created a filter using the company's cashtag within the Buzz Monitor. We APPENDIX I

applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2014, then we acquired Total Potential Impressions from December 31 st , 2013 to December 31 st 2014. We downloaded an excel file from Crimson Hexagon that contained data on Total Number of Twitter Authors and Average Posts per Author on a given day. Within the Excel file, we first multiplied the number of Twitter Authors posting on a given day by the average posts per author for that day in order to get the number of posts for each day. We then summed the total number of posts across the time period and divided this by the sum of Twitter Authors for the time period in order to arrive at Posts per Author.

Impressions per Post - We calculated this in Excel by dividing Total Potential

Impressions by Total Posts (after acquiring them as described above).

APPENDIX I

Z Model Social Data Forecast

Model overview

A company's financial health is of utmost importance when constructing an investment portfolio. A particularly troubling scenario for both investors and companies alike is financial bankruptcy. For the investor, poor financial health and bankruptcies can result in substantial losses if the investor is expecting a company to grow and does not anticipate a decline in company health. In the world of startups, investors should be especially concerned about the financial health of their investments because 55% of startups fail within the first 5 years of operation 6 .

The Z Model analyses was utilized to determine whether social equity data could be used either alone or in conjunction with existing models to better forecast the solvency risk (whether or not a company will file for bankruptcy) of companies. Our approach is based on Edward Airman's 1968 study, and it uses linear discriminant analysis to evaluate the health status of companies 7 . All of our model testing was conducted using the caret package within R 8 .

Although we used Compustat to identify companies that filed for bankruptcy between 2007 and 2014 (see "CrowdBureau Z Model Focus Benchmark" document for more information on the companies used in our analyses), our final analyses included companies that filed for bankruptcy between 2011 and 2014 due to time and resource constraints. In total, our Z Model analysis included 50 companies (24 bankrupt and 26 nonbankrupt). APPENDIX I

APPENDIX I

We acquired the 5 financial ratios (described in detail later) that Edward Altaian identified as being predictive of bankruptcy in his 1968 study using a combination of the QuoteMedia API and Gurufocus.com. Because our goal was to forecast bankruptcy, we restricted our analyses to data acquired from the annual fiscal report the year prior to the calendar year in which a company filed bankruptcy. In other words, if a company filed bankruptcy in 2014, then we acquired financial and social variables from 2012 to 2013. Data were capturedfor the 12 months leading up to and including a company's annual report date (e.g. Financial and social media data were gathered from 12/31/2012 to 12/31/2013 if a company filed its annual report on 12/31/2013).

After collecting the data and organizing it into baseline (Financial ratios only), overlay (Financial ratios plus social media data), and social media (social media only) matrices, we applied linear discriminant analysis to each of the models we created (FIG. 15). For testing the accuracy of each model, we randomly split our data into a training dataset (60% of companies) and a test dataset (40% of companies). Once we trained our linear discriminant models, weused the model to classify the remaining 40% of companies and calculated the resulting accuracy of the model's predictions. Because the model's accuracy can vary due to randomly selecting training and testing data, we performed the above sequence of steps 100 times and took the average and standard deviations of the 100 trials as our final accuracy score. We trained our discriminant model with all of the data in our baseline matrix and then used the coefficients given by the model to generate a Z score for each company. This calculation can be summed up as:

where "C" corresponds to a coefficient given by our model, and "R" corresponds to 1 of 5 Altaian ratios (described later). In the above equation, the coefficients can be obtained directly APPENDIX I from the caret package within R.

When overlaying social media variables, we use the same approach as described above. The main difference between the baseline models and social media overlay models is the matrix that we provide to the linear discriminant analysis function. In total, we constructed 24 different models (model descriptions provided separately) that consisted of the baseline model with different combinations of social media variables as well as a few models that consisted entirely of social media variables (described below). Due to time constraints, these models were not exhaustive in terms of the number of combinations that can be created, but they do serve as a substantial starting point for analyses.

The QuoteMedia API and Gurufocus.com website were used to acquire the financial information needed for our analyses. Crimson Hexagon and the internet archive were used to acquire all of the social media variables in our analyses.

Model variables (Financial and Social)

In total, we collected data on 10 different variables (5 financial and 5 social) for the companies in our analyses. The financial variables, along with a description of how we acquired/calculated these variables, are:

Working Capital/TotalAssets

These data were obtained using companies' annual report data from QuoteMedia.

Retained Earnings/Total Assets

These data were obtained using companies' annual report data from QuoteMedia.

Earnings Before Interest and Taxes/TotalAssets

These data were obtained using companies' annual report data from QuoteMedia. Note: If Earnings Before Interest and Taxes were not available, then our code (described later) attempts APPENDIX I to calculate this ratio using Earnings Before Interest, Taxes, Depreciation, and Amortization (EBITDA).

Market Value of Equity/Total Liabilities - Although we later developed code to download the variables necessary to calculate Market Value of Equity (i.e. Market Capitalization = Total Common Shares Outstanding x unadjusted closing price on annual report date) using annual report data from QuoteMedia (see "getHistoricalMarketCap.py" code description), our initial and final Altaian demos used the ratio provided byGurufocus.com

Sales/Total Assets

These data were obtained using companies' annual report data from QuoteMedia.

The social media variables, along with a description of how we acquired/calculated these variables, are:

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Identity block.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from May 23, 2008 to indefinitely. For a given company, data were collected from the twelve months leading up to the company's annual report date. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Conversation block.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual APPENDIX I report date. Data were obtained from the "Solvency and Z" Buzz Monitor on CrimsonHexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block.

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitter authors that posted during that time. Data were obtained from the "Solvency and Z" Buzz Monitor on Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. Note: If a company had 0 authors during the time range, then we manually set this value to 0 to avoid dividing by 0.

Impressions per Post - We calculated this as:

Total Potential Impressions (see above description)/Total Posts (see above description) Data were obtained from the "Solvency and Z" Buzz Monitor on CrimsonHexagon.

Under the 7 Building Blocks of Social Media, this could be classified as belonging to the

Conversation block. Note: If a company had 0 posts during the time range, then we manually set this value to 0 to avoid dividing by 0.

Company inclusion criteria.

This section is meant to give an overview of our selection process for the companies included in our moat analyses. Specific information on the companies themselves can be obtained from the "CrowdBureau Social Data Z Model Focus Benchmark" Word document and the "Z_Model_MasterMatrix" Excel document. We used Compustat to identify companies that filed for bankruptcy between 2007 and 2014. We then used the QuoteMedia API to

calculate/acquire 4 of the 5 financial variables described above (e.g. Working Capital/Total Assets), and we used Gurufocus.com to specifically obtain Market Value of Equity/Total APPENDIX I

Liabilities. To remain in our analysis, a company must be classified as having a template type of "N" by QuoteMedia. This allowed us to filter out financial institutions, which the Altaian Z ratios do not apply too. Further, a company must have had data for all 5 financial ratios in the year prior to its bankruptcy in order to remain in the analysis. Companies for which we could not acquire all ratios from the year prior to bankruptcy or which did not have a QuoteMedia API template type of "N" were expunged from our analyses. For efficiency, we used companies from our Quantitative Moat Social Data Forecast model (company list obtained via Morningstar website) as healthy controls. As before, companies must have satisfied the template and financial ratio requirements to remain in our analyses. Following filtering, a total of 50 companies were used in our final analysis. Of these companies, 24 filed for bankruptcy between 2011 and 2014, and 26 companies were not bankrupt.

Data acquisition.

To acquire the 5 financial variables (or the components that make up these variables) we developed several in-house Python scripts to download these data using the QuoteMedia API. These scripts are summarized below. These scripts were uploaded to Git within the

"Model_Code_2_24_16" zipped folder. Within this folder, they can be found in the subdirectory titled "Z_Model". All scripts are set up to obtain data from the 10 most recent annual reports for a given company, but a simple modification of the API call will allow one to acquire more reports if needed. Prior to running these scripts, one must have Python installed on his or her computer. In theory, Python2 or Python3 should work, but the data were acquired on a machine running Python 2.7. Moreover, these codes rely on the import of several python modules. To view the modules needed for each code, open the scripts with a text editor and look at the first few lines of code. APPENDIX I

The scripts and their purpose are as follows:

get_Altman WC_TA RE TA EBIT TA TotalLiabilites_Sales_TA.py - This script takes a list of company ticker symbols and returns Company Name, Ticker Symbol, Working Capital/Total Assets, Retained Earnings/Total Assets, Earnings Before Interest and Taxes/Total Assets, Total Liabilities, Sales/Total Assets, and the Report Date for which these ratios were acquired. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 52 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python get Altman WC TA RE TA EBIT TA TotalLiabilites Sales TA.py (Note: The default output file name "altman_ratios.tsv" can be changed to whatever filename is desired by adding "-o" followed by the intended output filename)

getHistoricalMarketCap.py - Although we used Gurufocus.com to acquire theMarket Value of Equity (Market Capitalization)/Total Liabilities ratio, we later developed this scriptto obtain these data from the QuoteMedia API. This script takes a list of company ticker symbols and returns Ticker Symbol, Total Common Shares Outstanding, Unadjusted Closing Price, and the Report Date in tab delimited format. Market Capitalization can be calculated in Excel by multiplying Total Common Shares Outstanding by the Unadjusted Closing Price. The Market Capitalization can then be divided by Total Liabilities to obtain the fourth Altaian ratio. With more work, it would be possible to integrate this script into the script described immediately above. To run this code for a particular list of companies, open the script using a text editor and APPENDIX I paste in a list of companies separated by a comma on Line 27 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel.

Usage: python getHistoricalMarketCap.py > HistoricalMC.txt (Note: "HistoricalMC.txt" can be changed to whatever filename is desired)

Although touched upon in the "Model Variables (Financial and Social)" section, we took the following approach to acquire the social media variables in our analyses.

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. For this model, we used the Internet Archive 9 to view historical snapshots of company websites (website were found doing secondary research) in order to find historical identity scores. Specifically, if a company filed for bankruptcy in 2014 and filed its annual report on 12/31/2013, then we attempted to find a snapshot of that company's website that was as close to 12/31/2013 as possible. If we were unable to find an adequate snapshot for the month that a company filed its annual report, then we moved to dates that were closer to the present. We did this as the archive tends to have more snapshots the closer that one gets to the present. If we were unable to find links on a company's webpage or that the company did not have a webpage any time near the filing date of the report (more or less within 1-2 years), then that company was assigned a score of 0. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for the links was not exhaustive. Under the 7 Building APPENDIX I

Blocks of Social Media, this would be classified as belonging to the Identity block.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of companies' cashtags on Twitter, Facebook,and Tumblr from May 23, 2008 to the present day. One critical note worth mentioning here is that it was necessary to take into account whether a bankrupt company underwent a change in ticker symbol (e.g. ALCS to ALCSQ) due to poor financial health. Thus, we often used two cashtags to capture data for bankrupt companies (e.g. $ALCS and $ALCSQ for Alco Stores Inc.). During the project, we used secondary research to determine whether a ticker symbol change occurred for a given company. During the course of the project however, QuoteMedia made it possible to use their API to determine whether a given company changed ticker symbol during a given year.

APPENDIX I

Due to time and resource constraints, we were not able to incorporate this new call into existing codes, but CrowdBureau may want to explore this possibility for future modeling.

For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag (typically two cashtags for bankrupt companies) within theBuzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2013 (bankruptcy in 2014), then we acquired Total Posts from December 31 st , 2012 to December 31 st 2013. The number of Total Posts was acquired from the monitor screen online.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. We created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from May 23, 2008 to present day. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitorand set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2013, then we acquired Total Potential Impressions from December 31 st , 2012 to December 31 st 2013. We downloaded an excel file from Crimson Hexagon that contained data on Total Potential Impressions as the website interface rounded this number. Within the Excel file, we summed the number of APPENDIX I

potential impressions each day in order to arrive at Total Potential Impressions for the time period.

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitterauthors that posted during that time. We created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr fromMay 23, 2008 to present day. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2013, then we acquired Total Potential Impressions from December 31 st , 2012 to December 31 st 2013. We downloaded an excel file from Crimson Hexagon that contained data on Total Number of Twitter Authors and Average Posts per Author on a given day. Within the Excel file, we first multiplied the number of Twitter Authors posting on a given day by the average posts per author for that day in order to get the number of posts for each day. We then summed the total number of posts across the time period and divided this by the sum of Twitter Authors for the time period in order to arrive at Posts per Author. If a company had 0 authors for the time period for which data were collected, then this value was set to 0 in order to avoid division by 0

Impressions per Post - We calculated this in Excel by dividing Total Potential

Impressions by Total Posts (after acquiring them as described above). If a company had 0 posts for the time period for which data were collected, then this value was set to 0 in order to avoid division by 0 APPENDIX I

Model testing and results

Once we acquired all of the financial and social media data described above for the 50 companies in our analysis, we generated the "Z_Model_MasterMatrix" Excel spreadsheet, which can be found on Confluence. This spreadsheet was too large to include in the report, but it contains all of the data along with other details (e.g. cashtags, report dates, social data date ranges, etc) that are useful for acquiring further information about the companies used in the modeling process. After generating this data matrix, we created the baseline data matrix for the Z model (Table 5).

Table 5. Snapshot of the Z model baseline matrix. Input variables are abbreviated for brevity. The variable we are trying to predict ("Bankruptcy") is highlighted in green. Although the company names are not shown, each row corresponds to a specific company. WC_TA = Working Capital/Total Assets; RE_TA= Retained Earnings/Total Assets; EBIT_TA= Earnings Before Interest and Taxes/Total Assets/ MVE_TL = Market Value of Equity/Total Liabilities; SA TA= Sales/Total Assets.

APPENDIX I

After setting up the baseline matrix, we performed a linear discriminant analysis on the baseline matrix to calculate the average accuracy of our baseline model. To do this, we developed an R script entitled "Script for Running Models.r". Although we aim to describe this script in detail separately, we will provide a brief overview of how this script determines the mean accuracy and standard deviation of model accuracy. This script was uploaded to Git in the "Model_Code_2_24_16" zipped folder and is contained within the "Modeling_Script" subdirectory of this archive.

The first step of this script involves importing the baseline data matrix (see Table 5 for example). After loading the matrix, the code randomly selects 60% of the data for training and 40% of the data for testing. As an example, if we loaded Table 5 (this table has 10 lines of data) into the code, then 6 lines of data would be randomly selected to train a linear discriminant model and 4 lines of data would be randomly selected for testing purposes. After training, the code predicts which category each data point in the test data fall under and then compares the predictions to the actual category of each data point. The accuracy of the model is then stored in a list, and the steps described above are repeated 99 more times for a total of 100 iterations.

After 100 iterations, the code prints out the average accuracy and the standard deviation of the accuracy. We plotted and compared the average accuracy along with standard error of the mean (standard deviation/square root of sample size) of the models that wetested.

After implementing the modeling code described above, we found that our baseline model was 84.5% accurate on average with a standard deviation of 8.4%. Note that these were the accuracies at the time we implemented the models. If run again, one is likely to get highly similar but not exact results due to the random selection of the training and testing data sets. The No Information Rate (NIR; Random Forecasting) for the Z model was 52%. These rates are APPENDIX I

what one would obtain if he or she were guessing the nature of a given company's moat at random.

After generating our baseline models, we added social media variables to our baseline matrices in different combinations (detailed in the "Z_Model_Model_Descriptions" document). We generated a total of 24 different overlay models that either included baseline data plus various combinations of social media data or social media data alone. The number of models tested was far from exhaustive as other combinations of social media and baseline variables exist that we did not test. We also did not explore combining subsets of baseline variables with social media variables. Thus, our conclusions below are based on a limited subset of combinations derived from the universe of combinations of baseline variables (viewed as "one" variable) and social media variables.

Using the code described above, we calculated the average accuracies and standard deviations of accuracies for each model and compared them to our baseline matrices. Of our 24 models, Model 15 (M15) appears to have a marginal accuracy increase (88.2% accuracy; 8.0% standard deviation) when forecasting one year in advance whether companies will file for bankruptcy relative to baseline (84.5% accuracy). Model 15 includes baseline data plus total potential impressions and total posts. Several models constructed using social media alone appear to forecast bankruptcy better than random. Given these results, along with the fact that we only tested a fraction of the different possible combinations in our analyses, our data suggest that further examination of the Z Model Social Data Forecast byCrowdBureau are warranted.

APPENDIX I arRankingMethodologies.pdf

Although the baseline and social media overlay models showed promise in forecasting bankruptcy, we compared our baseline Z model to what Edward Altaian's model wouldhave predicted for our dataset. To do this, we first calculated the Z-score of each company using the coefficients provided by our model and the coefficients given by the Altaian Z model. The coefficients given by our model are: 0.782 (Working Capital/Total Assets), -0.129 (Retained Earnings/Total Assets), 2.396 (EBIT/Total Assets), 0.169 (Market Value of Equity/Total Liabilities), and 0.0114 (Sales/Total Assets). The coefficients for Altaian's Z model are: 1.2 (Working Capital/Total Assets), 1.4 (Retained Earnings/Total Assets), 3.3 (EBIT/Total Assets), 0.6 (Market Value of Equity/Total Liabilities), and 1 (Sales/Total Assets). After calculating Z- score, we then transformed the Z-score for each company into a percentile score (High scoring companies received a lower percentile score while low scoring companies received a higher percentile score) based on Morningstar's method of calculating percentile rank 10 . Briefly: APPENDIX I

Percentile Rank = RoundDown((99 x (i - l)/(n - 1) + 1)), where "RoundDown" refers to Microsoft Excel's function for rounding values down to the nearest integer, "n" is the total number of observations (i.e. total number of companies in the analysis), and "i" is the absolute rank of each observation (obtainable via Excel's "rank" function). After obtaining the percentile ranks of each company we then calculated the cumulative bankruptcy frequency across all percentile ranks and plotted cumulative bankruptcy frequency against percentile ranks. We found that our model is similar to Altaian's model. However, CrowdBureau may want to consider calculating accuracy ratios for each model as described in Momingstar's December 2009 report by Warren Miller for a more quantitative comparison of our baseline Z model to the Altaian Z model 11 .

APPENDIX I

Solvency Score Social Data Forecast

Model overview

Company financial health is a critical factor in designing an investment portfolio to maximize returns. For the investor, poor financial health and bankruptcies can result in substantial losses if the investor is expecting a company to grow and does not anticipate a decline in company health. Regardless of one's investment strategy, the ability to forecast whether a company will enter into bankruptcy would be a valuable asset for an investor. This is especially pertinent in the world of startups where around 55% of startups fail within the first 5 years of operation 12 .

In addition to its Z model analyses, a Solvency Score Social Data Forecast model was created to determine whether social equity data could be used either alone or in conjunction with existing models to better forecast the solvency risk (whether or not a company will file for bankruptcy) of companies. Our approach is based on the Momingstar Solvency Score described

APPENDIX I

in Morningstar's December 2009 Methodology Paper authored by Warren Miller , and all of our model testing was conducted using the caret package within R 14 . Although we usedCompustat to identify companies that filed for bankruptcy between 2007 and 2014 (see "CrowdBureau Solvency Score Focus Benchmark" document for more information on the companies used in our analyses), our final analyses included companies that filed for bankruptcy between 2011 and 2014 due to time and resource constraints. In total, our Solvency Score analysis included 49 companies (23 bankrupt and 26 non-bankrupt). These 49 companies were also used in our Z Model Social Data Forecast.

We acquired the 3 financial variables based on 4 financial ratios (described in detail later) that Momingstar used in their 2009 Solvency Score methodology 12 . Because our goal was to forecast bankruptcy, we restricted our analyses to data acquired from the annual fiscal report in the year prior to the calendar year in which a company filed bankruptcy. In other words, if a company filed bankruptcy in 2014, then we acquired financial and social variables from 2012 to 2013. Data were captured for the 12 months leading up to and including a company's annual report date (e.g. Financial and social media data were gathered from 12/31/2012 to 12/31/2013 if a company filed its annual report on 12/31/2013).

APPENDIX I

APPENDIX I

After collecting the data and organizing it into baseline (Financial ratios only), overlay (Financial ratios plus social media data), and social media (social media only) matrices, we applied a logistic regression analysis to each of the models we created. Fortesting the accuracy of each model, we randomly split our data into a training dataset (60% of companies) and a test dataset (40% of companies). Once we trained our logistic regression models, we used the model to classify the remaining 40% of companies and calculated the resulting accuracy of the model's predictions. Because the model's accuracy can vary due to randomly selecting training and testing data, we performed the above sequence of steps 100 times and took the average and standard deviations of the 100 trials as our final accuracy score. We trained our logistic regression model with all of the data in our baseline matrix and then used the coefficients given by the model to generate a Solvency Score for each company. This calculation can be summed up as: Solvency Score =

where "C" corresponds to a coefficient given by our model, "V" corresponds to 1 of 3 variables derived from the 4 ratios mentioned above (described in detail later), and "Y" corresponds to the y-intercept. In the above equation, the coefficients can be obtained directly from the caret package within R.

When overlaying social media variables, we used the same approach as described above. The main difference between the baseline models and social media overlay models is the matrix that we provide to the logistic regression function. In total, we constructed 23 different models (model descriptions provided separately) that consisted of the baseline model with different combinations of social media variables as well as a few models that consisted entirely of social media variables (described below). Due to time constraints, these models were not exhaustive in APPENDIX I

terms of the number of combinations that can be created, but they do serve as a substantial starting point for analyses.

The QuoteMedia API was used to acquire the financial information needed for our analyses. Crimson Hexagon and the internet archive were used to acquire all of the socialmedia variables in our analyses.

Model variables (Financial and Social).

In total, we collected data on 8 different variables (3 financial and 5 social) forthe companies in our analyses. The financial variables, along with a description of how we acquired/calculated these variables, are:

SquareRoot(TLTA p X EBIE P ) - We calculated TLTA P as:

the percentile score of a company's Total Liabilities/Total Assets (PercentileScore(Total Liabilities/Total Assets).

We calculated EBIE P as:

101 - the percentile score of a company's Earnings Before Interest, Taxes, Depreciation, and Amortization/Interest Expense (101 - PercentileScore(EBITDA/InterestExpense)).

Percentile Score is calculated as: where "RoundDown" refers to Microsoft Excel's function for rounding values down to the nearest integer, "n" is the total number of observations (i.e. total number of companies in the analysis), and "i" is the absolute rank of each observation (obtainable via Excel's "rank" function). APPENDIX I

These data were obtained using companies' annual report data from QuoteMedia and further processed in Excel.

QR P - We calculated QR P as:

101 - Percentile score of the Quick Ratio.

We calculated the Quick Ratio as:

Quick ratio = (Current Assets - Inventories)/ Current Liabilities

These data were obtained using companies' annual report data from QuoteMedia and further processed in Excel.

ROICp- We calculated ROICp as:

101 - Percentile score of the Return on Invested Capital.

We calculated the Return on Invested Capital as:

Return on Invested Capital = (Net Income - Dividends)/Total Capitalization

These data were obtained using companies' annual report data from QuoteMedia and further processed in Excel.

The social media variables, along with a description of how we acquired/calculated these variables, are:

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include

Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Under the 7 Building

Blocks of Social Media, this would be classified as belonging to the Identity block.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of companies' cashtags on Twitter, Facebook, and APPENDIX I

Tumblr from May 23, 2008 to indefinitely. For a given company, data were collected from the twelve months leading up to the company's annual report date. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Conversation block.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. Data were obtained from the "Solvency and Z" Buzz Monitor on CrimsonHexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block.

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitterauthors that posted during that time. Data were obtained from the "Solvency and Z" Buzz Monitor on Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. Note: If a company had 0 authors during the time range, then we manually set this value to 0 to avoid dividing by 0.

Impressions per Post - We calculated this as:

Total Potential Impressions (see above description)/Total Posts (see above description) Data were obtained from the "Solvency and Z" Buzz Monitor on Crimson Hexagon.

Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. Note: If a company had 0 posts during the time range, then we manually set this value to 0 to avoid dividing by 0. APPENDIX I

Company inclusion criteria

APPENDIX I

This section is meant to give an overview of our selection process for the companies included in our moat analyses. Specific information on the companies themselves can be obtained from the "CrowdBureau Social Data Solvency Score Focus Benchmark" Word document and the "Solvency_Score_Master_Matrix_Final" document. We used Compustat to identify companies that filed for bankruptcy between 2007 and 2014. Wethen used the

QuoteMedia API to calculate/acquire data for the 3 financial variables described above. Dueto time constraints, we decided to use the same companies in our Z Model analyses in our Solvency Score analyses. Consequently, to remain in our analysis a company must be classified as having a template type of "N" by QuoteMedia. This allowed us to filter out financial institutions, which the Altaian Z ratios do not apply too. However, Morningstar's December 2009 Solvency Score analyses included financial institutions, and CrowdBureau should be aware of this information (see link in Footnote #12 for further details on the Morningstar Methodology). Further, a company must have had data available through QuoteMedia for all 4 financial ratios in the year prior to its bankruptcy in order to remain in the analysis. Companies for which we could not acquire all ratios from the year prior to bankruptcy or which did not have a QuoteMedia API template type of "N" were expunged from our analyses. For efficiency, we used companies from our Quantitative Moat Social Data Forecast model (company list obtained via Morningstar website) as healthy controls. As before, companies must have satisfied the template and financial ratio requirements to remain in our analyses. Following filtering, a total of 49 companies were used in our final analysis. Of these companies, 23 filed for bankruptcy between 2011 and 2014, and 26 companies were not bankrupt. APPENDIX I

Data acquisition

APPENDIX I

To acquire the data needed to construct the 3 financial variables (or the components that make up these variables) we developed an in-house Python scripts to download these data using the QuoteMedia API. The script is summarized below. The script was uploaded to Git within the "Model_Code_2_24_16" zipped folder. Within this folder, they can be found in the subdirectory titled "Solvency_Model". All scripts are set up to obtain data from the 10 most recent annual reports for a given company, but a simple modification of the API call will allow one to acquire more reports if needed. Prior to running these scripts, one must have Python installed on his or her computer. In theory, Python2 or Python3 should work, but the data were acquired on a machine running Python 2.7. Moreover, these codes rely on the import of several python modules. To view the modules needed for each code, open the scripts with a text editor and look at the first few lines of code.

The script and its purpose is as follows:

GetHistoricalRawSolvencyScoreVariables.py - This script takes a list of company ticker symbols and returns Company Name, Ticker Symbol, Total Liabilities, Total Assets,

EBITDA, Interest Expense, Current Assets, Inventory, Current Liabilities, Net Income, Cash

Dividends, Total Capitalization, and the Report Date for which these data were acquired. The financial ratios and variables for the Solvency Score model can be calculated in Excel as described above. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 70 of this script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code returns historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as Microsoft Excel. APPENDIX I

Usage: python GetHistoricalRawSolvencyScoreVariables.py (Note: The defaultoutput file name "QuoteMedia_Solvency_score_healthy.tsv" can be changed to whatever filename is desired by adding "-o" followed by the intended output file name.)

Although touched upon in the "Model Variables (Financial and Social)" section, we took the following approach to acquire the social media variables in our analyses.

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include

Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. For this model, we used the Internet Archive 15 to view historical snapshots of company websites (website were found doing secondary research) in order to find historical identity scores. Specifically, if a company filed for bankruptcy in 2014 and filed its annual report on 12/31/2013, then we attempted to find a snapshot of that company's website that was as close to 12/31/2013 as possible. If we were unable to find an adequate snapshot for the month that a company filed its annual report, then we moved to dates that were closer to the present. We did this as the archive tends to have more snapshots the closer that one gets to the present. If we were unable to find links on a company's webpage or that the company did not have a webpage any time near the filing date of the report (more or less within 1-2 years), then that company was assigned a score of 0. Lastly, our search through the websites typically included the "Home Page", "Media page"

(if present), and "Contact Us" page. Thus, our search for the links was not exhaustive. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Identity block.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("Solvency APPENDIX I and Z") that searched for the use of companies' cashtags on Twitter, Facebook, and

APPENDIX I

Tumblr from May 23, 2008 to the present day. One critical note worth mentioning here is that it was necessary to take into account whether a bankrupt company underwent a change in ticker symbol (e.g. ALCS to ALCSQ) due to poor financial health. Thus, we often used two cashtags to capture data for bankrupt companies (e.g. $ALCS and $ALCSQ for Alco Stores Inc.). During the project, we used secondary research to determine whether a ticker symbol change occurred for a given company. During the course of the project however, QuoteMedia made it possible to use their API to determine whether a given company changed ticker symbol during a given year. Due to time and resource constraints, we were not able to incorporate this new call into existing codes, but CrowdBureau may want to explore this possibility for future modeling.

For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag (typically two cashtags for bankrupt companies) within theBuzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2013 (bankruptcy in 2014), then we acquired Total Posts from

December 31 st , 2012 to December 31 st 2013. The number of Total Posts was acquired from the monitor screen online.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. We created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from May 23, 2008 to present day. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter APPENDIX I

APPENDIX I

using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2013, then we acquired Total Potential Impressions from December 31 st , 2012 to December 31 st 2013. We downloaded an excel file from Crimson Hexagon that contained data on Total Potential Impressions as the website interface rounded this number. Within the Excel file, we summed the number of potential impressions each day in order to arrive at Total Potential Impressions for the time period.

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitter authors that posted during that time. We created a Buzz Monitor on Crimson Hexagon ("Solvency and Z") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr fromMay 23, 2008 to present day. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2013, then we acquired Total Potential Impressions from December 31 st , 2012 to December 31 st 2013. We downloaded an excel file from Crimson Hexagon that contained data on Total Number of Twitter Authors and Average Posts per Author on a given day. Within the Excel file, we first multiplied the number of Twitter Authors posting on a given day by the average posts per author for that day in order to get the number of posts for each day. We then summed the total number of posts across the time period and divided this by the sum of Twitter Authors for the time period inorder APPENDIX I

to arrive at Posts per Author. If a company had 0 authors for the time period for which data were collected, then this value was set to 0 in order to avoid division by 0

Impressions per Post - We calculated this in Excel by dividing Total Potential

Impressions by Total Posts (after acquiring them as described above). If a company had 0 posts for the time period for which data were collected, then this value was set to 0 in order to avoid division by 0

Model testing and results

Once we acquired all of the financial and social media data described above for the 49 companies in our analysis, we generated the "Solvency_Score_Model_MasterMatrix" Excel spreadsheet, which can be found on Confluence. This spreadsheet was too large to include in the report, but it contains all of the data along with other details (e.g. cashtags, report dates, social data date ranges, etc) that are useful for acquiring further information about the companies used in the modeling process. After generating this data matrix, we created the baseline matrix for the Solvency Score model (Table 6).

Table 6. Snapshot of the Solvency Score model baseline data matrix. Input variables are abbreviated for brevity. The variable we are trying to predict ("Bankruptcy") is highlighted in green. Although the company names are not shown, each row corresponds to a specific company. are define above. APPENDIX I

After setting up the baseline matrix, we performed a logistic regression analysis on the baseline matrix to calculate the average accuracy of our baseline model. To do this, we developed an R script entitled "Script for Running Models.r". Although we aim to describe this script in detail separately, we will provide a brief overview of how this script determines the mean accuracy and standard deviation of model accuracy. This script was uploaded to Git in the "Model_Code_2_24_16" zipped folder and is contained within the "Modeling_Script" subdirectory of this archive.

The first step of this script involves importing the baseline data matrix (see Table 6 for example). After loading the matrix, the code randomly selects 60% of the data for training and 40% of the data for testing. As an example, if we loaded Table 5 (this table has 10 lines of data) into the code, then 6 lines of data would be randomly selected to train a linear discriminant model and 4 lines of data would be randomly selected for testing purposes. After training, the code predicts which category each data point in the test data fall under and then compares the predictions to the actual category of each data point. The accuracy of the model is then stored in a list, and the steps described above are repeated 99 more times for a total of 100 iterations.

After 100 iterations, the code prints out the average accuracy and the standard deviation ofthe APPENDIX I accuracy. We plotted and compared the average accuracy along with standard error of the mean (standard deviation/square root of sample size) of the models that we tested.

After implementing the modeling code described above, we found that our baseline model was 90.5% accurate on average with a standard deviation of 6.4%. Note that these were the accuracies at the time we implemented the models. If run again, one is likely to get highly similar but not exact results due to the random selection of the training and testing data sets. The No Information Rate (NIR; Random Forecasting) for the Solvency Score model was 53.1%. These rates are what one would obtain if he or she were guessing the nature of a given company's moat at random.

After generating our baseline models, we added social media variables to our baseline matrices in different combinations (detailed in the "Solvency_Score_Model_Descriptions" document). We generated a total of 23 different overlay models that either included baseline data plus various combinations of social media data or social media data alone. The number of models tested was far from exhaustive as other combinations of social media and baseline variables exist that we did not test. We also did not explore combining subsets of baseline variables with social media variables. Thus, our conclusions below are based on a limited subset of combinations derived from the universe of combinations of baseline variables (viewed as "one" variable) and social media variables.

APPENDIX I

Using the code described above, we calculated the average accuracies and standard deviations of accuracies for each model and compared them to our baseline matrices. Of our 23 models, Model 4 (M) appears to have a marginal accuracy increase (92.5% accuracy; 6.3% standard deviation) when forecasting one year in advance whether companies will file for bankruptcy relative to baseline (90.5% accuracy). Model includes baseline data plus total potential impressions. Several models constructed using social media alone appear to forecast bankruptcy better than random. Given these results, along with the fact that we only tested a fraction of the different possible combinations in our analyses, our data suggest that further examination of the Solvency Score Social Data Forecast by CrowdBureau are warranted.

Although the baseline and social media overlay models showed promise in forecasting bankruptcy, we compared our baseline Solvency Score model to what Morningstar's Solvency Score model would have predicted for our dataset. To do this, we first calculated the Solvency score of each company using the coefficients provided by our model and the coefficients given by the Morningstar Solvency Score model. The coefficients given by our model are: 0.14601 (SQRT(TLTA p x EBIEp)), 0.02793 (QR P ), 0.02786 (ROIC p ), and the y-intercept was -9.19726. The Morningstar Solvency Score coefficients are: 5 (SQRT(TLTA p x EBIE P )), 4 (QR P ), and 1.5 (ROICp). After calculating Solvency Scores, we transformed the Solvency Score for each company into a percentile score (High scoring companies received a lower percentile score while low scoring companies received a higher percentile score) based on Morningstar's method of calculating percentile rank 16 . Briefly: APPENDIX I where "RoundDown" refers to Microsoft Excel's function for rounding values down to the nearest integer, "n" is the total number of observations (i.e. total number of companies in the analysis), and "i" is the absolute rank of each observation (obtainable via Excel's "rank" function). After obtaining the percentile ranks of each company we then calculated the cumulative bankruptcy frequency across all percentile ranks and plotted cumulative bankruptcy frequency against percentile ranks. We found that our model is similar to Momingstar's Solvency Score model with regard to forecasting bankruptcy. However, CrowdBureau may want to consider calculating accuracy ratios for each model as described in Momingstar's December 2009 report by Warren Miller for a more quantitative comparison of our baseline Z model to the Momingstar Solvency Score model 17 .

Earnings per Share Social DataForecast

Model overview

When deciding which companies to invest in, profitability is a critical factor to keep in mind. In general, highly profitable companies are often good to invest in. A common indicator of a company's profitability is its Eamings per Share. We asked whether social media data alone could be used to forecast increases or decreases in Diluted Earnings per Share from one year to the next better than what one would achieve by forecasting at random.

To answer this question, we constructed several random forests models, using 5 social media points as input variables. We also acquired the Diluted Earnings per Share for 58 companies from 2013 and 2014. To calculate whether a company experienced an increase in Diluted Eamings per Share or decrease, we compared the annual Diluted Eamings per Share

APPENDIX I

(DEPS) from 2014 for a given company to the annual DEPS for that company in 2013. Wethen acquired social equity data (described later) from 2012 to 2013 in an effort to forecast the DEPS change from 2013 to 2014.

After acquiring the social media variables as well as determining the changes in DEPS for the companies in our analysis, we constructed a master data matrix with these data. Wethen applied a random forests model to several different variations of the matrix in order to distinguish companies that had an increase in DEPS from those that had a decrease in DEPS. The predictions of our models are based on 500 regression trees, and we implemented all of our modeling using the caret package within R 18 .

For testing the accuracy of each model, we randomly split our data into a training dataset (60% of companies) and a test dataset (40% of companies) (FIG. 16). After training the model with data from 60% of companies, we use the model to classify the remaining 40% of companies and calculate the accuracy. Because the model's accuracy can vary due to randomly selecting training and testing data, we performed the above sequence of steps 100 times and took the average and standard deviations of the 100 trials as our final accuracy score. Although we did not generate a final quantitative score for companies due to our findings that our models did not predict better than random, it is possible to obtain the probability of a DEPS increase directly from the caret package within R. APPENDIX I

In total, we constructed 23 different models (model descriptions provided separately) that consisted of different combinations of social media variables (described below). Due to time constraints, these models were not exhaustive in terms of the number of combinations that can be created, but they do serve as a substantial starting point for analyses. The QuoteMedia API and Momingstar website were used to acquire the financial information needed for our analyses. Crimson Hexagon and company website were used to acquire all of the social media variables in our analyses.

Model variables (Financial and Social)

In total, we collected data on 6 different variables (1 financial and 5 social) forthe companies in our analyses. The financial variable, along with a description of how we acquired/calculated this variable, is:

Change in Diluted Earnings per Share - We obtained annual Diluted Earnings per Share for companies in 2013 and 2014 directly from the QuoteMedia API. We then compared the DEPS from 2014 to the DEPS from 2013 in order to determine whether

APPENDIX I

The social media variables, along with a description of how we acquired/calculated these variables, are:

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Due to time constraints, we used the number of links on a company's website as of February 2016 under the assumption that the companies have not added a significant number of social media links to their website since 2013. Ideally, we would use the Intemet Archive to fetch historical scores. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for the links was not exhaustive. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Identity block.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("EPS; Data for 2014 change") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from January 1, 2012 to December 31, 2013. For a given company, data were collected from the twelve months leading up to the company's annual report date. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Conversation block.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. Data were obtained from the "EPS; Data for 2014 change" Buzz Monitor on Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. APPENDIX I

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitterauthors that posted during that time. Data were obtained from the "EPS; Data for 2014 change" Buzz Monitor on Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. Note: If a company had 0 authors duringthe time range, then we manually set this value to 0 to avoid dividing byO.

Impressions per Post - We calculated this as:

Total Potential Impressions (see above description)/Total Posts (see above description) Data were obtained from the "EPS; Data for 2014 change" Buzz Monitor on Crimson

Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belongingto the Conversation block. Note: If a company had 0 posts during the time range, then we manually set this value to 0 to avoid dividing by 0.

Company inclusion criteria.

This section is meant to give an overview of our selection process for the companies included in our moat analyses. Specific information on the companies themselves can be obtained from the "CrowdBureau Social Data Earnings per Share Focus Benchmark" Word document and the "EPS_changes_2013_to_2014_Master_Matrix" Excel document. In January of

2016, we used the Momingstar website to obtain a list of approximately 120 companies that were identified by Momingstar as having either a wide moat, a narrow moat, or no moat. We then used the QuoteMedia API to either directly acquire the 12 financial variables described above

(e.g. Total Revenue) or to acquire the variables needed to calculate these attributes (e.g. We acquired total common shares outstanding and unadjusted close and then calculated Market

Capitalization from these values). To remain in our analysis, a company must have reported all APPENDIX I

variables necessary to obtain the 12 financial input attributes for the moat model for their2014 annual report. Companies for which we could not acquire all 12 attributes from their 2014 annual report were expunged from our analyses. Following filtering, a total of 59 companies were used in our final analysis. For efficiency, we acquired the DEPS for these 59 companies. After acquiring DEPS and calculating DEPS changes, we expunged companies that had no change in DEPS as this category was rarely present (1 in 59 companies) and was in too low a quantity for the modeling. Our final analysis included 58 companies.

Data acquisition.

To acquire the Diluted Earnings per Share for the companies in our analysis we developed a Python script to download these data using the QuoteMedia API. The script is summarized below. The scripts was uploaded to Git within the "Model_Code_2_24_16" zipped folder. Within this folder, they can be found in the subdirectory entitled "EPS Model". The script is set up to obtain data from the 10 most recent annual reports for a given company, but a simple modification of the API call will allow one to acquire more reports if needed. Priorto running this script, one must have Python installed on his or her computer. In theory, Python2 or Python3 should work, but the data were acquired on a machine running Python 2.7. Moreover, the code relies on the import of several python modules. To view the modules needed for the code, open the script with a text editor and look at the first few lines of code.

The script and its purpose is as follows:

getHistoricalEPS.py - This script takes a list of company ticker symbols and returns

Ticker Symbol, Annual Diluted Earnings per Share, and the Report Date for which the data were acquired in tab delimited format. To run this code for a particular list of companies, open the script using a text editor and paste in a list of companies separated by a comma on Line 27 of this APPENDIX I script. An example list would be: ['AMGN','BIIB','BXLT']. The brackets, apostrophes, and commas are all necessary in order for the code to run properly. Be aware that this code retums historical data for several years (including 2014 and further into the past) so the data have to be further processed using a program such as MicrosoftExcel.

Usage: python getHistoricalEPS.py > HistoricalEPS.txt (Note: "HistoricalEPS.txt" can Although touched upon in the "Model Variables (Financial and Social)" section, we took

the following approach to acquire the social media variables in our analyses.

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include

Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Due to time constraints, we used the number of links on a company's website as of February 2016 under the assumption that the companies have not added a significant number of social media links to their website since 2013. Ideally, we would use the Internet Archive to fetch historical scores. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for the links was not exhaustive.

Total Posts - This is the total number of posts that included a company's cashtag (e.g.

$AMGN is the cashtag for Amgen). We created a Buzz Monitor on Crimson Hexagon ("EPS;

Data for 2014 change") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from January 1, 2012 to December 31, 2013. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag within the Buzz

Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual APPENDIX I

report on December 31 st , 2013, then we acquired Total Posts from December 31 st , 2012to December 31 st 2013. The number of Total Posts was acquired from the monitor screen online.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's cashtag during the twelve months leading up to the company's annual report date. We created a Buzz Monitor on Crimson Hexagon ("EPS; Data for 2014 change") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr fromJanuary 1, 2012 to December 31, 2013. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2013, then we acquired Total Potential Impressions from December 31 st , 2012 to December 31 st 2013. We downloaded an excel file from Crimson Hexagon that contained data on Total Potential Impressions as the website interface rounded this number. Within the Excel file, we summed the number of potential impressions each day in order to arrive at Total Potential Impressions for the time period.

Posts per Author - We calculated this as the total number of posts for the twelvemonth period preceding the company's annual report date divided by the total number of Twitter authors that posted during that time. We created a Buzz Monitor on Crimson Hexagon ("EPS; Data for 2014 change") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from January 1, 2012 to December 31, 2013. For a given company, data were collected from the twelve months leading up to the company's annual report date. To collect company and time-specific data, we created a filter using the company's cashtag within the Buzz Monitor. We APPENDIX I

applied this filter to the monitor and set the time range to encompass the year leading up to the company's annual report date. For example, if a company filed their annual report on December 31 st , 2013, then we acquired Total Potential Impressions from December 31 st , 2012 to December 31 st 2013. We downloaded an excel file from Crimson Hexagon that contained data on Total Number of Twitter Authors and Average Posts per Author on a given day. Within the Excel file, we first multiplied the number of Twitter Authors posting on a given day by the average posts per author for that day in order to get the number of posts for each day. We then summed the total number of posts across the time period and divided this by the sum of Twitter Authors for the time period in order to arrive at Posts per Author.

Impressions per Post - We calculated this in Excel by dividing Total Potential

Impressions by Total Posts (after acquiring them as described above).

Model testing and results

Once we acquired all of the financial and social media data described above for the 58 companies in our analysis, we generated the "EPS_changes_2013_to_2014_Master_Matrix" Excel spreadsheet, which can be found on Confluence. This spreadsheet was too large to include in the report, but it contains all of the data along with other details (e.g. cashtags, report dates, social data date ranges, etc) that are useful for acquiring further information about the companies used in the modeling process. After generating this master data matrix, created the datamatrices for our random forests model (Table 7). APPENDIX I

Table 7. Snapshot of example matrix with variables that are input into the DEPS model. Input variables are abbreviated for brevity. The variable we are trying to predict

("Change") is highlighted in green. Although the company names are not shown, each row corresponds to a specific company.

After setting up the baseline matrices, we proceeded to run a random forests model on each matrix to calculate the average accuracy of our social equity models in forecasting DEPS changes. To do this, we developed an R script entitled "Script for Running Models.r".

Although we aim to describe this script in detail separately, we will provide a brief overview of how this script determines the mean accuracy and standard deviation of model accuracy. This script was uploaded to Git in the "Model_Code_2_24_16" zipped folder and is contained within the "Modeling_Script" subdirectory of this archive.

The first step of this script involves importing the baseline data matrix (see Table 7).

data for testing. As an example, if we loaded Table 7 (this table has 10 lines of data) into the code, then 6 lines of data would be randomly selected to train a random forests model and 4 lines of data would be randomly selected for testing purposes. After training, the code predicts which APPENDIX I category each data point in the test data fall under and then compares the predictions to the actual category of each data point. The accuracy of the model is then stored in a list, and the steps described above are repeated 99 more times for a total of 100 iterations. After 100 iterations, the code prints out the average accuracy and the standard deviation of the accuracy. We plotted and compared the average accuracy along with standard error of the mean (standard deviation/square root of sample size) of the models that we tested.

We generated a total of 23 different models based on social media data alone, (detailed in the "Earnings jer_share_changes_model_descriptions"). The number of models tested was far from exhaustive. Thus, our conclusions below are based on a limited subset of combinations derived from the universe of combinations of baseline variables (viewed as "one" variable) and social media variables. After implementing the modeling code described above, we found that none of the models we constructed were able to forecast DEPS changes than random. In fact, our models often performed worse than random. The No Information Rate (NIR; Random

Forecasting) for the DEPS model was 63.8%. These rates are what one would obtain if he or she were guessing a company's DEPS change at random with no information.

Given these results, our data suggest that CrowdBureau should lessen its focus on using only social equity data to forecast changes in Diluted Earnings per Share.

APPENDIX I

Investor-specific Crowdfunding Social Data Forecast

Model overview

The enactment of the JOBS act made it possible for companies in the United States to raise the capital they need by means of crowdfunding and for non-accredited investors to invest in Small Cap Private Companies and Non-Publicly Traded Funds. While this new model of capital investing is an exciting method of connecting the masses with a means of investing in new businesses, it also carries many risks for aspiring investors as well as requiring a new

infrastructure for delivering information and complying with new regulations. One such risks stems from "all or nothing" funding scenarios where companies have to fully meet their fundraising goals in order to access the capital that was raised. For companies, being able to predict the probability of fully meeting a fundraising goal early would be valuable, especially if they are not on track to meet that goal and still have time to change their campaign strategy.

This model analysizes whether social media data had predictive power in terms of forecasting whether a given company would fully achieve its fundraising goal using datafrom the first quarter of its fundraising period and using data from a company's full fundraising period. Using Crowdfunder.com, we identified 21 companies that either fully met their fundraising goal during their allotted funding period (n = 11 companies) or did not fully meet their fundraising goal (n =10). We then constructed several random forests models using different combinations of 5 social equity data points (described in detail later) collected during both the first quarter of the companies' fundraising period and for the full fundraising period as input variables.

After acquiring the social media variables as well as determining which companies in our analyses fully met their fundraising goal, we constructed a master data matrix with these data. We then applied random forests model to several different variations of the matrix in order to APPENDIX I distinguish fully funded companies from those companies that did not receive full funding. The predictions of our models are based on 500 regression trees, and we implemented all of our modeling using the caret package within R 19 .

For testing the accuracy of each model, we randomly split our data into a training dataset (60% of companies) and a test dataset (40% of companies) (FIG. 17). After training the model with data from 60% of companies, we use the model to classify the remaining 40% of companies and calculate the accuracy. Because the model's accuracy can vary due to randomly selecting training and testing data, we performed the above sequence of steps 100 times and took the average and standard deviations of the 100 trials as our final accuracy score. Although we did not generate a final quantitative score for companies in this model, it is possible to obtain the probability of a given company becoming fully funded directly from the caret package within R.

In total, we constructed 23 different models (model descriptions provided separately) that consisted of different combinations of social media variables (described below).

Due to time constraints, these models were not exhaustive in terms of the number of combinations that can be created, but they do serve as a substantial starting point for analyses. The Crowdfunder.com website, Internet Archive and other

secondary research sources were used to acquire the financial information (i.e. funding status) APPENDIX I

needed for our analyses. Crimson Hexagon and company website were used to acquire all ofthe social media variables in our analyses.

Model variables (Financial and Social)

In total, we collected data on 6 different variables (1 financial and 5 social) forthe companies in our analyses. The financial variable, along with a description of how we acquired/calculated this variable, is:

Funding - We used Crowdfunder.com to gather information on companies' fundraising start date, fundraising end date, funding goal, and reservations/funds raised prior to and up to the end of the fundraising period. Companies that met or exceeded their funding goal within the fundraising time period were considered to be "fully funded". Companies that did not meet their funding goal within their fundraising period were classified as "not fully funded".

The social media variables, along with a description of how we acquired/calculated these variables, are:

Identity Score - We calculated identity score as the number of links to social media website that each company displays on its main website. Here, social media websites include Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Due to time constraints, we used the number of links on a company's website as of February 2016 under the assumption that the companies have not added or subtracted a significant number of social media links to their website recently. Ideally, we would use the Internet Archive to fetch historical scores. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for the links was not exhaustive. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the APPENDIX I

Identity block.

Total Posts - This is the total number of posts that included a company's Twitter handle (e.g. @Trustify is the Twitter handle for Trustify). We created a Buzz Monitor on Crimson Hexagon ("CrowdFunder Companies") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from December 31, 2013 to the present day. For a given company, data were collected either during the first fourth of their funding period (e.g. first 25 days of a 100 day-long fundraising period) or for the entirety of their fundraising period. Under the 7 Building Blocks of Social Media, this would be classified as belonging to the Conversation block.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's Twitter handle either during the first fourth of their funding period or for the entirety of their fundraising period. Data were obtained from the "CrowdFunder Companies" Buzz Monitor on Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block.

Posts per Author - We calculated this as the total number of posts for either the first quarter of their crowdfunding period or for the entirety of the fundraising period divided by the total number of Twitter authors that posted during that time. Data were obtained from the

"CrowdFunder Companies" Buzz Monitor on Crimson Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belonging to the Conversation block. Note: If a company had 0 authors during the time range, then we manually set this value to 0 to avoid dividing by 0 Impressions per Post - We calculated this as:

Total Potential Impressions (see above description)/Total Posts (see above description) Data were obtained from the "CrowdFunder Companies" Buzz Monitor on Crimson APPENDIX I

Hexagon. Under the 7 Building Blocks of Social Media, this could be classified as belongingto the Conversation block. Note: If a company had 0 posts during the time range, then we manually set this value to 0 to avoid dividing by 0.

Company inclusion criteria

This section is meant to give an overview of our selection process for the companies included in our moat analyses. Specific information on the companies themselves can be obtained from the "CrowdBureau Investor-specific Crowdfunding Focus Benchmark" Word document and the "Crowdfunder Data MasterMatrix First Quarter Funding" and

"Crowdfunder Data MasterMatrix Full Funding Period" Excel documents. We used the Crowdfunder.com website as our primary source for obtaining fundraising data for specific companies. Wemainly excluded companies that had not finished fundraising by February 2016 with the exception of companies that exceeded their funding goal in February 2016 before the end of their fundraising period (Example: Company A's fundraising end date may have been June 2016, but we would include Company A in our analysis if it had already met or exceeded its funding goal in February 2016.

Data acquisition.

A majority of the financial information that we used in the investor-specific analyses were obtained directly from the Crowdfunder.com website as of February 2016. However, occasionally we used the internet archive and other resources (e.g. Google Searches, press releases, etc.) to determine when some companies' fundraising period concluded asthis information was not always readily available vie the website.

We took the following approach to acquire the social media variables in our analyses.

Identity Score - We calculated identity score as the number of links to social media APPENDIX I website that each company displays on its main website. Here, social media websites include Facebook, Twitter, Tumblr, Linkedln, Google +, Pinterest, and Instagram. Due to time constraints, we used the number of links on a company's website as of February 2016 under the assumption that the companies have not added or subtracted a significant number of social media links to their website since 2013. Ideally, we would have used the Internet Archive to fetch historical scores. Lastly, our search through the websites typically included the "Home Page", "Media page" (if present), and "Contact Us" page. Thus, our search for social media links was not exhaustive.

Total Posts - This is the total number of posts that included a company's Twitter handle. We created a Buzz Monitor on Crimson Hexagon ("CrowdFunder Companies") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from December 31, 2013 to the present day. For a given company, data were collected either during the first quarter of the company's crowfunding period (e.g. first 25 days of a funding period lasting 100 days) or the company's full crowdfunding period. To collect company and time-specific data, we created a filter using the company's Twitter handle within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the desired time range. The number of Total Posts was acquired from the monitor screen online.

Total Potential Impressions - This is the total potential impressions made by posts that included a company's Twitter handle either during the first quarter of its crowdfunding period or during the entirety of its fundraising period. We created a Buzz Monitor on Crimson Hexagon ("CrowdFunder Companies") that searched for the use of companies' cashtags on Twitter, Facebook, and Tumblr from December 31, 2013 to the present day. For a given company, data were collected either during the first quarter of entirety of the fundraising period. To collect APPENDIX I company and time-specific data, we created filters using the companies' Twitter handles within the Buzz Monitor. We applied this filter to the monitor and set the time range to the desired window of time for which data were needed. We downloaded an excel file from Crimson Hexagon that contained data on Total Potential Impressions as the website interface rounded this number. Within the Excel file, we summed the number of potential impressions each day in order to arrive at Total Potential Impressions for the time period.

Posts per Author - We calculated this as the total number of posts for either the first quarter or entirety of the funding period divided by the total number of Twitter authors that posted during that time. We created a Buzz Monitor on Crimson Hexagon ("CrowdFunder Companies") that searched posts for the inclusion of companies' Twitter Handles on Twitter, Facebook, and Tumblr from December 31, 2013 to the present day. To collect company and time- specific data, we created a filter using the company's Twitter handle within the Buzz Monitor. We applied this filter to the monitor and set the time range to encompass the desired

dates. We downloaded an excel file from Crimson Hexagon that contained data on Total Number of Twitter Authors and Average Posts per Author on a given day. Within the Excel file, we first multiplied the number of Twitter Authors posting on a given day by the average posts per author for that day in order to get the number of posts for each day. We then summed the total number of posts across the time period and divided this by the sum of Twitter Authors for the time period in order to arrive at Posts per Author.

Impressions per Post - We calculated this in Excel by dividing Total Potential

Impressions by Total Posts (after acquiring them as described above).

Model testing and results APPENDIX I

Once we acquired all of the financial and social media data described above for the 21 companies in our analysis, we generated the

"Crowdfunder Data MasterMatrix Full Funding Period" and

"Crowdfunder Data MasterMatrix First Quarter Funding" Excel spreadsheets, which can be found on Confluence. These spreadsheets contain all of the data along with other details (e.g. Twitter handles, report dates, social data date ranges) that are useful for acquiring further information about the companies used in the modeling process. After generating this master data matrix, we created the data matrices for our random forests model for the first quarter of the fundraising period and the full fundraising period, (see Table 8 for an example view of the model matrix).

Table 8. Snapshot of example matrix with variables that are input into the Investor- specific Crowdfunding Social Data Forecast model. Input variables are abbreviated for brevity. The variable we are trying to predict ("Funding") is highlighted in green. The data below are from the first quarter of companies' fundraising period. Although the company names are not shown, each row corresponds to one company.

After setting up the matrices, we ran a random forests model on each matrix to calculate APPENDIX I the average accuracy of our social equity models in forecasting fundraising success. To do this, we developed an R script entitled "Script for Running Models.r". Although we aim to describe this script in detail separately, we will provide a brief overview of how this script determines the mean accuracy and standard deviation of model accuracy. This script was uploaded to Git in the "Model_Code_2_24_16" zipped folder and is contained within the "Modeling_Script" subdirectory of this archive.

The first step of this script involves importing the baseline data matrix (see Table 8). After loading the matrix, the code randomly selects 60% of the data for training and 40% of the data for testing. As an example, if we loaded Table 8 (this table has 10 lines of data) into the code, then 6 lines of data would be randomly selected to train a random forests model and 4 lines of data would be randomly selected for testing purposes. After training, the code predicts which category each data point in the test data fall under and then compares the predictions to the actual category of each data point. The accuracy of the model is then stored in a list, and the steps described above are repeated 99 more times for a total of 100 iterations. After 100 iterations, the code prints out the average accuracy and the standard deviation of the accuracy. We plotted and compared the average accuracy along with standard error of the mean (standard deviation/square root of sample size) of the models that we tested.

We generated a total of 23 different models based on social media data alone, (detailed in the "Investor specific first quarter model descriptions" and

"Investor_specific_full_funding_period_model_descriptions" documents). The number of models tested was far from exhaustive. Thus, our conclusions below are based on a limited subset of combinations derived from the universe of combinations of baseline variables (viewed as "one" variable) and social media variables. APPENDIX I

After implementing the modeling code described above, we found that several of the models we constructed using either data from the first quarter of the fundraising period or for the entire fundraising period was able to forecast a company's probability of becoming fully funded substantially better than random.

In fact, our most accurate model using the first quarter funding data had almost 80% accuracy (Model 5; 79.6% average accuracy with a standard deviation of 6.5%), and our most accurate model (Model 15) using full funding period data was 81.1% accurate on average (standard deviation of accuracy was 13.9%). Both of these values are higher than theNo Information Rate (NIR; Random Forecasting), which was 52.4%. A rate of 52.4% accuracy is what one would obtain if he or she were guessing the probability of a company becoming fully funded at random. Model 5 was comprised of Identity Score and Posts per Author, and Model 15 consisted of total potential impressions, impressions per post, and posts per author. Given these results, along with the fact that we only tested fraction of all the different models that could be constructed (using only 5 social media variables), these data strongly suggest that social media has predictive power with regards to forecasting crowdfunding success and that CrowdBureau should continue to develop its investor-specific rating using social equity data.