Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
SYSTEM AND METHOD FOR TALENT INSIGHTS GENERATION AND RECOMMENDATION
Document Type and Number:
WIPO Patent Application WO/2019/142157
Kind Code:
A1
Abstract:
A system and method for talent insight generation and recommendation is disclosed. The method includes obtaining a plurality of technical keywords associated with technological skills and technology vendors from a plurality of technical data sources. The plurality of technical keywords are parsed to obtain a plurality of target entities based on a plurality of rules corresponding the plurality of technical data sources. The plurality of target entities are associated with a corresponding technical context. The plurality of target entities are classified into a plurality of categories based at least on the corresponding context to obtain a plurality of classified target entities, using a supervised machine learning model. The plurality of classified target entities are linked to generate a talent insight graph.

Inventors:
SANTHANAM JANARDHAN (IN)
GURURAJAN VINAYAKA RAM (IN)
JAGADEESAN DWARAKANATH (IN)
KUMARESSHAN PRABAKARAN (IN)
KRISH ASHOK (IN)
Application Number:
PCT/IB2019/050473
Publication Date:
July 25, 2019
Filing Date:
January 21, 2019
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
TATA CONSULTANCY SERVICES LTD (IN)
International Classes:
G06F15/18
Foreign References:
US20170091692A12017-03-30
US20020147611A12002-10-10
US20120290565A12012-11-15
Attorney, Agent or Firm:
KOSHAL, Amit et al. (IN)
Download PDF:
Claims:
WE CLAIM:

1. A processor-implemented method for talent insight generation and recommendation, comprising:

obtaining, via one or more hardware processors, a plurality of technical keywords associated with technological skills and technology vendors from a plurality of technical data sources, wherein the plurality of technical keywords comprises a first set of technical keywords obtained from a seed data and a second set of technical keywords obtained based on a technological relationship between the first set of technical keywords and the second set of technical keywords;

parsing, via the one or more hardware processors, the plurality of technical keywords to obtain a plurality of target entities based on a plurality of rules corresponding to the plurality of technical data sources, the plurality of target entities associated with a corresponding technical context, wherein parsing the plurality of technical keywords comprises:

obtaining the first set of technical keywords in form of a first set of hypertext markup language (HTML) pages by using a reference application programming interface (API);

obtaining a second set of HTML pages corresponding to the second set of technical keywords based on the first set of HTML pages;

identifying technical data sources of the first and second set of HTML pages; and

converting the first and second set of HTML pages into a format defining a plurality of bidirectional relationships between the two or more technical keywords selected from the first set of technical keywords and the second set of technical keywords;

computing, for each of the plurality of bidirectional relationships, a matching score indicative of closeness centrality of the plurality of target entities forming the bidirectional relationships;

classifying, via the one or more hardware processors, the plurality of target entities into a plurality of categories based at least on the matching score and the corresponding context to obtain a plurality of classified target entities, using a supervised machine learning model; and

linking, via the one or more hardware processors, the plurality of classified target entities to generate a talent insight graph.

2. The method as claimed in claim 1, wherein creating the talent insight graph comprises, iteratively:

determining one or more bidirectional relationships selected from the plurality of bidirectional relationships satisfying at least one predetermined criterion; and

linking, from amongst the plurality of target entities, a set of target entities associated with the one or more bidirectional relationships to form the talent insight graph.

3. The method as claimed in claim 2, wherein the at least one predetermined criteria corresponding to each of the one or more bidirectional relationships comprises: determining whether said one or more bidirectional relationships between the set of target entities are inferred in one or more sections of the first and second set of HTML pages; and

validating, if said one or more bidirectional relationships are determined to be inferred in the one or more sections of the first and second set of HTML pages, by determining whether a text is available in the one or more sections of the first and second set of HTML pages to qualify said one or more bidirectional relationships.

4. The method as claimed in claim 3, further comprising creating a default relationship between the set of target entities on determining that the text is unavailable in the one or more sections of the first and second set of HTML pages for qualifying the one or more bidirectional relationships.

5. The method as claimed in claim 1, further comprising:

assigning an adjacency score to each path of a plurality of paths between the plurality of target entities of the talent insight graph based on a count of traversal of the said path; and

providing one or more recommendations based at least on the adjacency score associated with each path of the plurality of paths.

6. The method as claimed in claim 1, wherein the plurality of categories comprises a categorization of digital and non-digital keywords, a categorization of skill and vendor related keywords, a categorization of product or non-product related keywords, a categorization of meta-entity and non-meta-entity related keywords.

7. A system for talent insight generation and recommendation, the system comprising:

at least one memory storing instructions; and

one or more hardware processors coupled to said at least one memory, wherein said one or more hardware processors are configured by said instructions to: obtain a plurality of technical keywords associated with technological skills and technology vendors from a plurality of technical data sources, wherein the plurality of technical keywords comprises a first set of technical keywords obtained from a seed data and a second set of technical keywords obtained based on a technological relationship between the first set of technical keywords and the second set of technical keywords;

parse the plurality of technical keywords to obtain a plurality of target entities based on a plurality of rules corresponding to the plurality of technical data sources, the plurality of target entities associated with a corresponding technical context, wherein to parse the plurality of technical keywords to obtain the plurality of target entities, the one or more hardware processors are further configured by the instructions to:

obtain the first set of technical keywords in form of a first set of HTML pages by using a reference API;

obtain the second set of HTML pages corresponding to the second set of technical keywords based on the first set of HTML pages;

identify technical data sources of the first and second set of HTML pages; and

convert the first and second set of HTML pages into a format defining a plurality of bidirectional relationships between the two or more technical keywords selected from the first set of technical keywords and the second set of technical keywords;

compute, for each of the plurality of bidirectional relationships, a matching score indicative of closeness centrality of the target entities forming the bidirectional relationships;

classify the plurality of target entities into a plurality of categories based at least on the matching score to obtain a plurality of classified target entities, using a supervised machine learning model; and

link the plurality of classified target entities to generate a talent insight graph.

8. The system as claimed in claim 7, wherein to creating the talent insight graph, the one or more hardware processors are further configured by the instructions to iteratively:

determine one or more bidirectional relationships selected from the plurality of bidirectional relationships satisfying at least one predetermined criterion; and link, from amongst the plurality of target entities, a set of target entities associated with the one or more bidirectional relationships to form the talent insight graph.

9. The system as claimed in claim 8, wherein the at least one predetermined criteria corresponding to each of the one or more bidirectional relationships comprises: determining whether said one or more bidirectional relationships between the set of target entities are inferred in one or more sections of the first and second set of HTML pages; and

validating, if said one or more bidirectional relationships are determined to be inferred in the one or more sections of the first and second set of HTML pages, by determining whether a text is available in the one or more sections of the first and second set of HTML pages to qualify said one or more bidirectional relationships.

10. The system as claimed in claim 9, wherein the one or more hardware processors are further configured by the instructions to create a default relationship between the set of target entities on determining that the text is unavailable in the one or more sections of the first and second set of HTML pages for qualifying the one or more bidirectional relationships.

11. The system as claimed in claim 7, wherein the one or more hardware processors are further configured by the instructions to:

assign a adjacency score to each path of a plurality of paths between the plurality of target entities of the talent insight graph based on a count of traversal of the said path; and

provide one or more recommendations based at least on the adjacency score associated with each path of the plurality of paths.

12. The system as claimed in claim 7, wherein the plurality of categories comprises a categorization of digital and non-digital keywords, a categorization of skill and vendor related keywords, a categorization of product or non-product related keywords, a categorization of meta-entity and non-meta-entity related keywords.

Description:
SYSTEM AND METHOD FOR TALENT INSIGHTS GENERATION AND

RECOMMENDATION

DESCRIPTION

PRIORITY

[001] The present invention claims priority to India Application (Title: System And Method For Talent Insights Generation And Recommendation) No. 201821001519, filed in India on January 22, 2018.

TECHNICAL FIELD

[002] The present disclosure in general relates to talent insights generation and recommendation, and more particularly, to a system and method for talent insights generation and recommendation using a talent insight engine.

BACKGROUND

[003] Information technology is evolving at a rapid pace. New technologies and their newer versions are being published every day by technology companies and Open source forums. The rapid growth of digital technologies has resulted in understanding of technological relationships complex for an average learner. The evolving nature of digital technologies and ecosystem also creates technology relationships that either never existed before or no longer valid. Typically, enterprises utilizes certain systems that facilitates in understanding of such skill, vendor and people inter-relationships through spreadsheet based systems or online lists / hierarchies.

[004] The inventors here have however recognized several technical problems with such conventional systems, as explained below. Such conventional systems manage relationships through rigid hierarchies that face challenge in handling dynamism in the ecosystem on an everyday basis. Moreover, due to the rapidly evolving technologies, enterprises face difficulty in identifying associates who have right set of skills and in most cases, associates that have adjacent skills. For example, on searching for‘Python’ skills, conventional skill- search systems would identify individuals with declared skills only on Python, but they may not bring up someone who has declared their skills on R, Mahout, Flink, SAS or other such equivalent tools who can be cross-skilled on Python. In addition, the conventional systems, being in form of spreadsheets or basic digital systems do not present an intuitive interface for visualizing these interrelationships that is also interactive and navigational in nature, thereby hindering the skill identification and learning process.

SUMMARY

[005] The following presents a simplified summary of some embodiments of the disclosure in order to provide a basic understanding of the embodiments. This summary is not an extensive overview of the embodiments. It is not intended to identify key/critical elements of the embodiments or to delineate the scope of the embodiments. Its sole purpose is to present some embodiments in a simplified form as a prelude to the more detailed description that is presented below.

[006] In one implementation, a processor-implemented method for talent insight generation and recommendation is provided. The method includes obtaining, via one or more hardware processors, a plurality of technical keywords associated with technological skills and technology vendors from a plurality of technical data sources. Further the method includes parsing, via the one or more hardware processors, the plurality of technical keywords to obtain a plurality of target entities based on a plurality of mles corresponding the plurality of technical data sources. The plurality of target entities are associated with a corresponding technical context. Herein, the plurality of technical keywords are parsed by obtaining the first set of technical keywords in form of a first set of hypertext mark-up language (HTML) pages by using a reference application programming interface (API). The second set of HTML pages corresponding to the second set of technical keywords are obtained based on the first set of HTML pages. Further, technical data sources of the first and second set of HTML pages are identified, and the first and second set of HTML pages are converted into a format defining a plurality of bidirectional relationships between the two or more technical keywords selected from the first set of technical keywords and the second set of technical keywords. For each of the plurality of bidirectional relationships, a matching score indicative of closeness centrality of the target entities forming the bidirectional relationships is computed.

[007] Furthermore the method include classifying, via the one or more hardware processors, the plurality of target entities into a plurality of categories based at least on the corresponding context to obtain a plurality of classified target entities, using a supervised machine learning model. Moreover, the method includes linking, via the one or more hardware processors, the plurality of classified target entities to generate a talent insight graph.

[008] In another implementation, a system for talent insight generation and recommendation is provided. The system includes at least one memory storing instructions and one or more hardware processors coupled to said at least one memory. The one or more hardware processors are configured by said instructions to obtain a plurality of technical keywords associated with technological skills and technology vendors from a plurality of technical data sources. Further, the one or more hardware processors are configured by said instructions to parse the plurality of technical keywords to obtain a plurality of target entities based on a plurality of rules corresponding the plurality of technical data sources. The plurality of target entities are associated with a corresponding technical context. The one or more hardware processors are configured by said instructions to parse the plurality of technical keywords by obtaining the first set of technical keywords in form of a first set of HTML pages by using a reference API. The second set of of HTML pages corresponding to the second set of technical keywords are obtained based on the first set of HTML pages. Further, technical data sources of the first and second set of HTML pages are identified, and the first and second set of HTML pages are converted into a format defining a plurality of bidirectional relationships between the two or more technical keywords selected from the first set of technical keywords and the second set of technical keywords. For each of the plurality of bidirectional relationships, a matching score indicative of closeness centrality of the target entities forming the bidirectional relationships is computed.

[009] Furthermore, the one or more hardware processors are configured by said instructions to classify the plurality of target entities into a plurality of categories based at least on the corresponding context to obtain a plurality of classified target entities, using a supervised machine learning model. Moreover, the one or more hardware processors are configured by said instructions to link the plurality of classified target entities to generate a talent insight graph.

[0010] In yet another implementation, a non-transitory computer-readable medium having embodied thereon a computer program for executing a method for talent insight generation and recommendation is provided. The method includes obtaining, via one or more hardware processors, a plurality of technical keywords associated with technological skills and technology vendors from a plurality of technical data sources. Further, the method includes parsing, via the one or more hardware processors, the plurality of technical keywords to obtain a plurality of target entities based on a plurality of rules corresponding the plurality of technical data sources. The plurality of target entities are associated with a corresponding technical context. Herein, the plurality of technical keywords are parsed by obtaining the first set of technical keywords in form of a first set of HTMF pages by using a reference API. The second set of of HTMF pages corresponding to the second set of technical keywords are obtained based on the first set of HTMF pages. Further, technical data sources of the first and second set of HTMF pages are identified, and the first and second set of HTMF pages are converted into a format defining a plurality of bidirectional relationships between the two or more technical keywords selected from the first set of technical keywords and the second set of technical keywords. For each of the plurality of bidirectional relationships, a matching score indicative of closeness centrality of the target entities forming the bidirectional relationships is computed.

[0011] Furthermore, the method include classifying, via the one or more hardware processors, the plurality of target entities into a plurality of categories based at least on the corresponding context to obtain a plurality of classified target entities, using a supervised machine learning model. Moreover, the method includes linking, via the one or more hardware processors, the plurality of classified target entities to generate a talent insight graph.

BRIEF DESCRIPTION OF THE FIGURES

[0012] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to reference like features and modules.

[0013] FIG. 1 illustrates a network implementation of a system for talent insight generation and recommendation, in accordance with an embodiment of the present disclosure.

[0014] FIG. 2 illustrates a system for talent insight generation and recommendation, in accordance with an embodiment of the present disclosure.

[0015] FIG. 3 illustrates a flowchart for talent insight generation and recommendation, in accordance with an embodiment of the present disclosure. [0016] FIG. 4 illustrates an example process-flow for talent insight generation and recommendation, in accordance with another embodiment of the present disclosure.

[0017] FIG. 5 illustrates an example of interactive visualization for talent insight generation and recommendation, in accordance with another embodiment of the present disclosure.

[0018] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems and devices embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.

DETAILED DESCRIPTION

[0019] Information technology is growing at a rapid rate, newer technologies and their new versions are being published frequently by technology companies and Open source forums. The rapid growth of digital technologies have made understanding technology relationships complex for an average learner. For example, learners or beginners may feel as to 'How is Cloudera related to Hadoop?’, ' How is Hadoop related to Mapreduce ?’ , ' How is Python related to Django ? 'If I have learnt R programming, what’s the best recommendation for me to learn next?’ , and so on. Typically, these and other such challenges are being faced by technology learners, that have not been solved adequately till date.

[0020] The evolving nature of digital technologies and ecosystem also creates technology relationships that either never existed before (e.g., Mashery got acquired by TIBCO, hence is now a TIBCO product ) or are no longer valid ( after Hive and PigLatin were introduced, Mapreduce was no longer a pre-requisite for building Hadoop based applications). Currently, the conventional systems handle these relationships through fairly rigid hierarchies that cannot handle such dynamism in the ecosystem on an everyday basis.

[0021] Currently, enterprises face a lot of difficulty in identifying associates who have the right set of skills and in most cases, associates that have adjacent skills. It will be understood that the skills are adjacent when they are engineered in different fashions but they are used to achieve the same end goal. For instance, R and Python are adjacent skills as both are used to achieve the same end goal i.e. data analysis even though they were developed separately. For example currently, copies of skill lists are maintained using basic tools and text-based searches are performed using such tools. However, the conventional systems do not take into account the dynamic nature of skill adjacencies. For example, on searching for‘Python’ skills, typical skill-search systems would identify individuals with declared skills only on Python, but they may not bring up someone who has declared their skills on R, Mahout, Flink, SAS or other such equivalent tools who can be cross-skilled on Python.

[0022] Enterprises wish to bridge the gap between individual’s current skill-set and expected requirements such as job demands, examination assessment, and so on by providing relevant training. Since training involves time, effort and money, it should be carefully suggested/ recommended for meeting the goals of training with ease. However, it is observed that the candidates attending the training are sometimes at different skill levels and therefore selecting same training plan all the candidates in the same training environment may not be an effective solution.

[0023] In addition, such conventional systems handling skills and people are heavily reliant on individuals and manual processes to create and maintain the content, resulting in high effort, lack of agility and consequently, incorrect data in the system. Further, conventionally such skills, vendor and people inter-relationships are viewed and understood at best through spreadsheet based systems or online lists/hierarchies. Conventional systems or solutions lack intuitive representation of these interrelationships that is also interactive and navigational in nature, and thus devoid leaners to explore, self-discover technology adjacencies and continue learning.

[0024] The present disclosure provides method and system to overcome above-mentioned technical problems recognized in conventional systems and present technological improvements as solutions to one or more of the above-mentioned technical problems. Various embodiments presented herein disclose methods and system for talents insights generation and recommendation. The disclosed method and system facilitates in automatically creating and maintaining inter-relationships between various technologies, their versions, software vendor and people information, and skills possessed by individuals. The system is caused to dynamically update the inter-relationships with the rapid technological advancements. Additionally, the system gives visibility to appropriate technology adjacencies to various stakeholders, thereby enabling engagement of the right people with right skills in right technological opportunities. Further, the disclosed system provides intuitive visualization and user experience to derive the best out of such an information system. [0025] While aspects of described system and method for verifying profile information associated with a candidate, it may be implemented in any number of different computing systems, environments, and/or configurations, the embodiment's are described in the context of the following exemplary system.

[0026] Referring now to FIG. 1, a network implementation 100 of a system 102 for talents insights generation and recommendation is illustrated, in accordance with an embodiment of the present disclosure. Although the present disclosure is explained by considering that the system 102 is implemented as a software program on a server, it may be understood that the system 102 may also be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, a server, a network server, cloud, and the like. It will be understood that the system 102 may be accessed by multiple users through one or more user devices 104-1, 104-2... 104-N, collectively referred to as user devices 104 hereinafter, or applications residing on the user devices 104. Examples of the user devices 104 may include, but are not limited to, a portable computer, a personal digital assistant, a hand-held device, and a workstation.

[0027] The user devices 104 are communicatively coupled to the system 102 through a network 106. In one implementation, the network 106 may be a wireless network, a wired network or a combination thereof. The network 106 can be implemented as one of the different types of networks, such as intranet, local area network (LAN), wide area network (WAN), the internet, and the like. The network 106 may either be a dedicated network or a shared network. The shared network represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocol/Internet Protocol (TCP/IP), Wireless Application Protocol (WAP), and the like, to communicate with one another. Further the network 106 may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, and the like. Various components and functionalities of the system 102 are described further with reference to FIG. 2.

[0028] FIG. 2 illustrates a block diagram of a system 200 for talent insight generation and recommendation, in accordance with an example embodiment. In an example embodiment, the system 200 may be embodied in, or is in direct communication with the system, for example the system 102 (FIG. 1). In an embodiment, the system 200 facilitates the verification/screening of the candidate profile on a crowdsourcing platform. The system 200 includes or is otherwise in communication with at least one hardware processor such as a processor 202, at least one memory such as a memory 204, and an I/O interface 206. The processor 202, memory 204, and the network interface element 206 may be coupled by a system bus such as a system bus 208 or a similar mechanism.

[0029] The at least one processor 202 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the at least one processor 202 is configured to fetch and execute computer-readable instmctions stored in the memory 204.

[0030] The I/O interface 206 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interface 204 may allow the system 200 to interact with a target directly or through the target devices 104. Further, the I/O interface 206 may enable the system 200 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interface 206 may facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. The I/O interface 206 may include one or more ports for connecting a number of devices to one another or to another server.

[0031] The memory 204 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, the memory 204 includes a plurality of modules 220 and a repository 240 for storing data processed, received, and generated by one or more of the modules 220. The modules 220 may include routines, programs, objects, components, data structures, and so on, which perform particular tasks or implement particular abstract data types. In one implementation, the modules 220 may include various modules 222 comprising instructions for performing particular tasks or implementing particular abstract data types, and other modules 224. The other modules 224 may include programs or coded instructions that supplement applications and functions of the system 200. [0032] The repository 240, amongst other things, includes a system database 242 and other data 244. The other data 244 may include data generated as a result of the execution of one or more modules in the other modules 224. The repository 240 is further configured to maintain seed data 246 and target entity data 248, as will be explained in detail in the description below.

[0033] In an embodiment, the system 200 may derive the seed data 246 from a plurality of sources. The seed data may include categorization of a plurality of technical keywords associated with technological skills and technology vendors. The plurality of technical keywords includes skill such as Java, Angular2 and so on; vendor providing solutions for such skill such as IBM, Cloudera. Herein, for the brevity of description only a few terms have been provided as examples, it will however be understood that the seed data may include an exhaustive database of keywords pertaining to technologies (existing and emerging), technology vendors and people. It will also be understood that the seed data is adaptable and expandable to update skills and vendors. In an example embodiment, the seed data may be obtained from one or more users. Alternatively, the system 200 may embody natural language processing (NLP) capabilities for obtaining the seed data from the plurality of technical data sources. Herein, the plurality of technical data sources may include sources of technical information such as journals, Wikipedia, stack overflow, and so on.

[0034] While deriving the seed data, the system 200 is capable of identifying the context of the technical keywords. In an embodiment, the system 200 leverages the features of Topic Modelling to narrow down the context of each technical keyword in the seed data. Topic Modelling is a type of Statistical Modelling in Machine Learning and Natural Language Processing (NLP) to identify the topics that occur in a given text data. In various embodiments disclosed herein, the system 200 confines the topics to terms related to Information Technology (IT).

[0035] The system 200 is configured to logically link the available keywords from the seed data and form a schema free graph, also referred to as talent insight graph. The creation of the talent insight graph includes a two-step process. Initially, the seed data 246 is converted into a knowledge base, and subsequently, the knowledge base is converted into the talent insights graph. The creation of the knowledge base and the talent insights graph by the system 200 is explained further in detail below. [0036] In order to create a knowledge base from the seed data, the system 200 identifies target entities, which is followed by classification of said target entities. The objective of identification of target entities is to automatically verify the technical keywords generated from the seed data and otherwise and to derive the actual entity names that can become nodes in the talent insights graph. Herein, the plurality of technical keywords includes a first set of technical keywords and a second set of technical keywords. The system 200 derives the first set of technical keywords from the seed data as is explained above. The second set of technical keywords are obtained based on a technological relationship between the first set of technical keywords and the second set of technical keywords.

[0037] In an embodiment, the system 200 parses the plurality of technical keywords into a plurality of target entities based on a plurality of rules corresponding the plurality of sources. The plurality of target entities are associated with a corresponding context. As mentioned, the plurality of rules can be corresponding to the plurality of technical data sources. An example of a rule may be, in case the technical data source is Wikipedia™, the system 200 can derive technical keywords by accessing certain section of the Wikipedia™ page. As another example of a rule, if the technical data source includes research papers from specific journals, then the system 200 can derive technical keywords from‘word tags’ of said technical paper. It will be understood that the above mentioned rules are provided for the sake of clarity of description here, however, in different implementations and embodiments, various other rules can be determined corresponding to the technical data sources of the technical keywords. Herein, it will be understood that the data sources such as Wikipedia™ may be utilized by the system 200 to qualify a node of the talent insight graph. Each technical data source may have a specific template, as such, each template corresponding to a different technical data source has to be read and interpreted in a different manner. In an embodiment, reading and interpreting each template can be done by applying a rule corresponding to said rule. In an embodiment, each rule may define the aspects of a page(s) associated with source that may be read to get information, and accordingly to qualify the entity.

[0038] The system 200 obtains the seed data, and converts the technical keywords into an HTML version of a webpage. For example, the system 200 may pass the seed data to a web crawler, wherein the web crawler may hit a source API and retrieve the HTML pages. The system 200 may utilize a reference API for converting the keywords into the HTML version of the webpage such that each key word corresponds to a single HTML page. In an embodiment, certain technical data sources such as Wikipedia, stack overflow may be utilized as a reference API.

[0039] When the system 200 retrieves the HTML pages, the system 200 also gets references to other HTML pages that describe a set of keywords not specified in the seed data. The system 200 utilizes the set of keywords to identify other target entities or potential target entities which can be utilized for the talent insight graph. Hence, the other HTML pages are associated with a set of technical keywords technologically linked to the plurality of technical keywords. Herein, the term‘technologically linked’ refers to linking of the technical terms by a relationship. For example, AngularJS is a framework of Java Scripts. ReactJS is another framework of Java Script. AngularJS and ReactJS are linked to Java Script through the relation“is a Framework”. Herein, said linking is based on the technical properties of the frameworks, and hence the relationship between the AngularJS, ReactJS and Java Scripts is referred to as‘technologically linked’.

[0040] In an embodiment, the system 200 identifies sources of the HTML pages associated with the plurality of keywords and the set of keywords. Further, the system 200 converts the HTML pages into a format defining a plurality of bidirectional relationships between the two or more of the plurality of keywords and the set of keywords. For instance, the system 200 converts the HTML pages into entity relationship entity format. Additionally, the system 200 may tag said HTML pages with a signal. Herein,‘tag’ is a label that is given to a HTML page based on its contents. Each HTML page can have multiple tags. For example a page on AWS may have tags such as Cloud Computing, SAAS, PAAS, IAAS, and so on. Herein, the term‘signal’ refers to information of interest that can be derived from the HTML pages. HTML is the format in which the page is retrieved.

[0041] The system 200 computes, for each of the plurality of bidirectional relationships, a matching score based on the cosine distance. The matching score may be a number between 0 and 1, and may represent an extent of closeness or matching between the entities. Higher the matching score, stronger the relationship between two entities. Said matching score may be obtaining by determining a distance between the respective entities. The cosine distance may decides if two entities may be subsequently linked or not in the talent insight graph. In an example embodiment, the target entities with a matching score greater than or equal to a threshold matching score are identified as entities to be linked to each other. For example, ‘Java’ is close to‘Java script’, and‘Java script’ is close to‘AngularJS’, and accordingly, the cosine distance between‘Java’,‘Java script’, and‘AngularJS’ may be a number between 0 and 1. The system 200 may determine the threshold matching score between the entities, and identify said entities as target entities for subsequent formation of the talent insight graph.

[0042] Upon identification of said target entities, the system 200 classifies them into a plurality of categories. In an embodiment, the system 200 classifies the target entities based at least on the matching score and the corresponding context to obtain a plurality of classified target entities, using a supervised machine learning model. In an embodiment, the categories may include a categorization of digital and non-digital keywords, a categorization of skill and vendor related keywords, a categorization of product or non-product related keywords, a categorization of meta-entity and non-meta-entity related keywords. For example, for categorization of skill and vendor related keywords, the system may utilize a predetermined meta-model developed based on the properties and/or features of skills and vendors. The meta-model takes into consideration available features of existing skills and vendors to be classified under a given category. For example,‘Angular’ is a‘Framework’ developed in ‘Java Script’ which is used for developing mobile applications. This is a digital skill. Hence, based on the properties, each of the confirmed target entities can be matched to the property. Each of the confirmed target entities is passed to the meta model, and the outcome is a prediction that classifies the entities to a skill or a vendor. Similarly, for classifying the target entities into digital and non-digital categories, the system 200 utilizes the features of available NLP libraries to perform the classification process. Said NLP libraries can be used for named entity resolution. Lor identification of target entities.

[0043] Upon identifying and classifying the target entities, the plurality of classified target entities are logically linked. In an embodiment, the system 200 links the plurality of classified target entities to create a talent insight graph. In an embodiment, to create the talent insight graph, the system 200 is caused to iteratively determine one or more bidirectional relationships selected from the plurality of bidirectional relationships satisfying at least one predetermined criterion. Herein, the at least one predetermined criterion corresponding to each of the one or more bidirectional relationships may include determining whether said one or more bidirectional relationships between the set of target entities are inferred in other sections of the HTML pages by the system 200. Further, the system 200 validates whether said one or more bidirectional relationships are inferred in one or more sections of the HTML pages, by determining whether a text is available in the one or more sections of the HTML pages to qualify said one or more bidirectional relationships.

[0044] The system 200 is caused to link a set of target entities associated with the one or more bidirectional relationships to form the talent insight graph. In an embodiment, the system 200 creates a default relationship between the set of target entities on determining that the text is unavailable in the one or more sections of the HTML pages for qualifying the one or more bidirectional relationships. This step is repeated for all the permutations of the entities.

[0045] Herein, each target entity of the talent insight graph is related to other target entities in a specific order, which can be referred to as directionality of the graph. In an embodiment, bidirectional relationships exist among different target entities. Bidirectional relationship refers to two-way relationship with which entities are connected to each other in both directions. For example, Python is a Programming Language for Machine Learning. Machine Learning can be implemented using Python. If the relationship is seen at a perspective from Python, it can be determined where all Python can be applied, and also adjacent entities that are similar to Python. If the relationship is seen at perspective from Machine Learning, the system can determine all programming languages can be used to implement Machine Learning, for example, R, Python, Matlab, and so on.

[0046] In an embodiment, by embodying the directional relationships between the target, entities the structure of the talent insight graph shows patterns of how the various target entities are related to each other. The talent insight graph therefore acts as a template for recommendation to various users, as is explained in detail below.

[0047] The talent insight graph includes a plurality of nodes wherein each node represents a target entity. Once the talent insight graph is created with skill and the vendors integrated as target entities, the system 200 incorporates‘people’ component as an additional dimension to the talent insight graph. The‘people’ component of the talent insight graph indicates a user that is in possession of one or more technical skills.

[0048] In an embodiment, the system 200 allows users to search for aspiration skills for variety of reasons such as learning, skill development, and so on. While searching for aspirational skills, the user can traverse one or more paths in the talent insight graphs. In an embodiment, the system 200 assigns an adjacency score to each path of a plurality of paths between the plurality of target entities of the talent insight graph based on a count of traversal of the said path. For example, once the talent insights graph is created, the system 200 may be able to capture multiple metrics that can give insights about the usage of the talent insight graph. The system 200 computes multiple metrics, such as an adjacency score for a node in between two nodes of the talent insights graph. An adjacency score is indicative of between ness centrality of a given entity that lies between the target entities connected by more than one entity. The adjacency score measures the extent to which a given node lies in the path between other nodes. For instance, the adjacency score of a node/ target entity named ‘Python’ in the context of‘Data science’ = Number of times the node is traversed / Total available paths in data science component of the talent insight graph. The adjacency score thus shows the relevance or importance of the given node in the context of two other nodes, and thus can be used for the purpose of recommendations. For instance, an associate working on AngularJS platform in an organization may be willing to become a Full stack developer. Based on the adjacency score associated with the node AngularJS in the talent insight graph, the system may provide recommendations such as recommending courses related to AngularJs to the associate to achieve the end goal of Full Stack Developer. AngularJs is an important stop in the Full Stack Development Journey. Said recommendation can be shared with the associate.

[0049] In accordance, more the number of times a graph is traversed, the adjacency score keeps on increasing with every traversal. In an embodiment, the system 200 is caused to utilize said adjacency score for the purpose of recommending target entities (or target skills or vendors) to the users of the talent insight graph. In an embodiment, the talent insight graph configured herein is an evolving graph that keeps on updating itself based on the usage by multiple users and evolution of technologies.

[0050] A particular advantage of the system 200 is talent discover-ability that can be utilized in organizations or otherwise. For example, users who have traversed specific / niche path in the talent insight graph can be narrowed and discovered using the adjacency score assigned to said paths. In addition, an organization's focus can be laid out and it can be matched to each individual's achievements in order to identify gaps in learning for said individual. Another innovative approach is crowd sourcing, wherein the associates can recommend and make suggestions to the structure of the talent insights graph, thereby making the process of generation of the talent insights graph self-evolving. [0051] FIG. 3 illustrates a flowchart 300 for talent insight generation and recommendation, in accordance with an example embodiment. In an embodiment, a seed data is obtained by a system, for example, the system 200 (FIG. 2) for talent insight generation and recommendation. The seed data includes a plurality of technical keywords which can be logically linked by the system 200 to form a schema free graph, also referred to as talent insight graph. The talent insight graph can be utilized for recommendation purpose, as will be described in details below.

[0052] At 302, the method includes obtaining a plurality of technical keywords associated with technological skills and technology vendors from a plurality of technical data sources. At 304, the method includes parsing the plurality of technical keywords to obtain a plurality of target entities based on a plurality of rules corresponding the plurality of technical data sources, the plurality of target entities associated with a corresponding technical context. At 306, the method 300 includes computes a matching score for each of a plurality of bidirectional relationships between the two or more technical keywords selected from the first set of technical keywords and the second set of technical keywords. The matching score is indicative of closeness centrality of the target entities forming the bidirectional relationships. At 308, the method 300 includes classifying, via the one or more hardware processors, the plurality of target entities into a plurality of categories based at least on the corresponding context to obtain a plurality of classified target entities, using a supervised machine learning model. At 308, the method 300 includes linking, via the one or more hardware processors, the plurality of classified target entities to create a talent insight graph. A detailed process flow for talent insight generation and recommendation is described further with reference to FIG. 4.

[0053] FIG. 4 illustrates a detailed process flow 400 for talent insight generation and recommendation, in accordance with an example embodiment. In an embodiment, a talent insight graph is generated by a system, for example, the system 200 (FIG. 2). In an embodiment, the talent insight graph is utilized for providing recommendations to users, as is explained below.

[0054] The source for developing the talent insights graph is a seed data that contains a list of technical keywords including those related to skills such as Java, Angular JS, and so on; and Vendors such as Oracle, IBM, and so on. The seed data is derived from a plurality of technical data sources. At 402 said technical data sources are identified. Examples of the technical data sources include, but are not limited to, technical journals, conference papers, online technical data sources such Wikipedia, Stack overflow, and so on. Seed data 404 is extracted from the plurality of technical data sources 408. The seed data 404 is passed to a web crawler 406. The web crawler 406 may hit one or more reference APIs to retrieve the HTML pages from technical keyword data sources 408. The reference API converts the technical keywords into the HTML version of the webpage such that each key word corresponds to a single HTML page. In an embodiment, certain technical data sources such as Wikipedia™, stack overflow may be utilized as a reference API.

[0055] The web crawler 406, in addition to retrieving HTML pages corresponding to technical keywords obtained from seed data, gets references to other HTML pages that describe a set of keywords not specified in the seed data the HTML pages. The web crawler 406 utilizes the set of keywords derived from the seed data to identify other target entities (or potential target entities which can be utilized for the talent insight graph) from technical data sources 408. Hence, the web crawler facilitates in retrieving other HTML pages that are associated with a set of technical keywords technologically linked to the plurality of technical keywords.

[0056] The plurality of keywords including the first set of technical keywords and the second set of technical keywords are together stored in a source data repository 410. The source data parser takes 412 the HTML pages and converts them into a format defining bidirectional relationship. For example, the source data parser 412 may convert the HTML pages into (entity relationship entity) format with a signal tagged. Herein, the source data parser tags each of the bidirectional relationship with a signal which is indicative of sufficient data within the bidirectional relationship. In an embodiment, the source data parser 412 utilizes parsing rules for each technical data source 414. If any of the parsed output does not have a signal tagged then it is resent to the web crawler for refinement and tagging. This is a constant feedback process. Once the output of the parser is tagged as entity relationship entity format with the signal, then the target entities that are qualified to be a part of the talent insight graph are considered to be identified at 416.

[0057] Upon identification of target entities at 416, the classification of the target entities is performed at 418 using supervised machine learning. For example, the target entities may be classified into a plurality of categories including, but not limited to, a categorization of digital and non-digital keywords, a categorization of skill and vendor related keywords, a categorization of product or non-product related keywords, a categorization of Meta-entity and non-meta-entity related keywords. In an embodiment, a meta-model may be developed that may include predetermined properties/features of a skill and a vendor. Said meta-model may take into consideration available features of existing skills and vendors to be classified under a given category.

[0058] Based on the properties, each of the confirmed target entities is matched to the property. Herein, properties of an entity refer to inherent traits of said entity. For example, AngularJS is a Full Stack Development Framework., and hence Framework is a property of AngularJS. AWS is a Platform, and hence‘Platform’ is a property of AWS. Each of the target entities may be passed to the meta-model, and the output may be a prediction that classifies the entities to a skill or a vendor. In a similar manner, the target entities may be classified into categories of digital and non-digital. In an embodiment, features of available NLP library can be leveraged to perform the entity classification process. The target entities with respective categorization forms knowledge base for the talent insight system, for example, the system 200 (FIG. 2). The talent insights system may include a talent insight data store 422 for storing the talent insights.

[0059] After identifying and classifying the entities, the next step is to link the identified and classified target entities logically. In an embodiment, a set of rules may be utilized for linking the target entities to form a relationship. If the given set of target entities satisfies the rules/criteria, they may be linked in the talent insights graph at 424. For example, if a sentence is retrieved from the text of a HTPM page with (entity relationship entity) format then the entities qualify directly into the graph. If a relationship is inferred in other sections of the HTML file then they are validated to see if any sentence is available to qualify the relationship. This is an inferred relationship. If, however, a clear relationship between target entities is not clearly determined, then a template is created, and the entities are fit into said template. Said relationship is a referred to as a default relationship. Said step is repeated for all the permutations of the target entities. For each of the plurality of bidirectional relationships, an adjacency score is indicative of closeness centrality of the target entities forming the bidirectional relationships. The adjacency score measures the extent to which a given node lies in the path between other nodes. In other words, the adjacency Score determines how close is a given skill relative to other available skills, thereby helping in identifying the relative importance of a given skill vis-a-vis other skills in a sub-set of the talent insight graph.

[0060] Once the talent insights graph is created with skill and the vendors integrated, a user component is appended to the talent insights graph, thereby giving a new dimension to the talent insight graph, as is explained below.

[0061] In an embodiment, the users may be learners or those individuals who wish to learn the technology related subject matter. In order to understand the learning path that they should navigate, such learners may access the disclosed system. In an embodiment, to include the user component in the talent insight graph, a database of internal learner profile 428 and online profiles (available from external online sources) 430 may be integrated with the talent insight graph at 434.

[0062] Herein, as various users keep searching for aspirational skills and traverse in that path, relation will get stronger in a given path. This can be used for recommendation purposes. In an embodiment, an adjacency score is assigned to each path of a plurality of paths between the plurality of target entities of the talent insight graph based on a count of traversal of the said path. Hence, the confidence score increases for every traversal. A benefit of such a talent insight system is talent discover-ability. Users who have traversed specific / niche path can be narrowed and discovered.

[0063] In an example application of the talent insights graph, an organization's focus can be laid out and it can be matched to each individual's achievements in order to identify gaps. For example, the talent insight graph can be embodied in a an API for instance, a talent insight graph API 436. The talent insight graph API 436 may be coupled to an internal recruitment platform 438 of the organization as well as a learner’s platform 440.

[0064] FIG. 5 illustrates an example of a portion of a talent insight graph 500, in accordance with an example embodiment. The talent insight graph 500 is shown to include a plurality of nodes, for example, nodes 502, 504, 506, 508, 510, 512, and 514. The nodes are represented by one of the target entities including for example user, skill and vendor. For example, the node 502 represents a user having attributes such as experience and role.

[0065] The user represented by the node 502 may be desirous of learning a programming language represented by the node 508. The disclosed system, for example, the system 200 (FIG. 2) may present the talent insight graph to the user where the talent insight graph may show various learning paths that the user can take to learn the programming language. For example, in order to learn the graphics, the user can follow the path 502-504-508. The library of the programming language is shown by the path through nodes 502-506-508. The web framework of the programming language is shown by the path through nodes 510-508. The UI framework of the programming language is shown by the path through nodes 502-512- 508. Herein, based on user’s attributes, the system 200 may determine the best path that the user can take to leam the programming language, and recommend the same to the user.

[0066] The foregoing description of the specific implementations and embodiments will so fully reveal the general nature of the implementations and embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.

[0067] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.