COMPUTER- IMPLEMENTED METHOD FOR PROVIDING A RECOMMENDATION TO A TEACHER USER FOR CREATING A PERSONALIZED TEACHING COURSE

Title:

COMPUTER- IMPLEMENTED METHOD FOR PROVIDING A RECOMMENDATION TO A TEACHER USER FOR CREATING A PERSONALIZED TEACHING COURSE

Document Type and Number:

WIPO Patent Application WO/2024/033951

Kind Code:

Abstract:

A computer- implemented method for providing a recommendation to a teacher user for creating a personalized teaching course by selecting and integrating teaching resources, comprising the following steps : a ) in an electronic storage device, storing an ontological database comprising at least category data, resource data, a teacher identification code, a set of teacher user metadata, a set of student user metadata and, for each of said resource data, an evaluation index and a set of resource metadata; b ) storing or providing a new resource datum in the electronic storage device, where the resource category datum does not have a belonging category of the resource datum to a category datum; c ) performing the following steps on an electronic processor : c1 ) processing the resource metadata set of the new resource datum and performing a statistical similarity measurement between the resource metadata set of the new resource datum and all resource metadata sets of the remaining stored resource data or a subset of the remaining stored resource data in the electronic storage device; c2 ) associating and storing in the new resource datum the category datum associated with the resource datum or resource data which exceed a predetermined threshold value of the statistical similarity measurement calculated in step cl ); d) providing a recommendation model; e ) applying said recommendation model to said ontological database, to identify and propose to the teacher user one or more resource data based on the teacher user metadata set, the student user metadata set, the evaluation index and the resource metadata set.

Inventors:

EPIFANIA FRANCESCO (IT)
MATAMOROS ARAGON RICARDO ANIBAL (IT)

Application Number:

PCT/IT2022/000043

Publication Date:

February 15, 2024

Filing Date:

August 11, 2022

Export Citation:

Click for automatic bibliography generation Help

Assignee:

SOCIAL THINGS S R L (IT)

International Classes:

G09B5/00; G06F16/23; G06F16/25; G06N3/08; G06N20/20; G09B7/02

Foreign References:

US20200302296A1	2020-09-24
CN107085803A	2017-08-22
CN110457283A	2019-11-15
CN103886054B	2017-02-15

Other References:

JEEVAMOL JOY ET AL: "An ontology-based hybrid e-learning content recommender system for alleviating the cold-start problem", EDUCATION AND INFORMATION TECHNOLOGIES, SPRINGER US, BOSTON, vol. 26, no. 4, 30 March 2021 (2021-03-30), pages 4993 - 5022, XP037513228, ISSN: 1360-2357, [retrieved on 20210330], DOI: 10.1007/S10639-021-10508-0

Attorney, Agent or Firm:

DE LORENZO, Danilo et al. (IT)

Download PDF:

View/Download PDF PDF Help

Claims:

CLAIMS

1. A computer-implemented method for providing a recommendation to a teacher user for creating a personalized teaching course by selecting and integrating teaching resources, said method comprising the following steps: a) in an electronic storage device, storing an ontological database comprising at least the following stored data: category data, where each category datum is an identifier of a specific course or subject category or a vector of course or subject categories, resource data, where each resource datum of said resource data comprises notions, for example a link to a document or a presentation or the document or the presentation, and a resource category datum, said resource category datum being an identifier of the related category datum with which the resource datum is associated or being an identifier of a missing association of a resource category datum; a teacher identification code, which identifies the teacher user who creates or interacts with the personalized teaching course; a set of teacher user metadata for each teacher user comprising one or more of the following data: age, anonymous identifier, subject area, number of courses in which he/she teaches, number of proprietary resources, evaluations provided;

- a set of student user metadata for each student user comprising one or more of the following data: age, anonymous identifier, resource identifiers with which the student user interacted, course year, schooling level, evaluations provided, average grades; for each of said resource data: an evaluation index, identifying an evaluation level attributed to each of said one or more resources and/or a number of views of each of said one or more resources; a collection of resource metadata comprising one or more of the following data: representative keywords of the resource, description, language, duration, recommended age, owner; b) storing or providing a new resource datum in the electronic storage device, wherein the resource category datum associated with said new resource datum identifies a lack of association of a resource category datum, i.e., it does not have a belonging category of the resource datum to a category datum; c) on an electronic processor, performing the following steps: cl) processing the resource metadata set of the new resource datum and performing a statistical similarity measurement between the resource metadata set of the new resource datum and all resource metadata sets of the remaining stored resource data or a subset of the remaining stored resource data in the electronic storage device; c2) associating and storing in the new resource datum the category datum associated with the resource datum or resource data which exceed a predetermined threshold value of the statistical similarity measurement calculated in step cl); d) providing a recommendation model; e) applying said recommendation model to said ontological database, also comprising the new resource datum, to identify and propose to the teacher user one or more resource data based on the teacher user metadata set, the student user metadata set, the evaluation index and the resource metadata set.

2. A method according to claim 1, wherein step cl) comprises the following sub-steps of: c11) processing the resource metadata set of the new resource datum to construct a weighted matrix Mp by means of a TF-IDF coefficient, where said weighted matrix Mp has n rows equal to the number of processed resource data and m columns equal to the number of all the metadata of the resource metadata set, c12) using the Singular Value Decomposition algorithm to construct latent representations for each stored resource datum and for the new resource datum to be inserted;

C13) identifying the latent representation of the new resource datum as the identification profile thereof and calculating the statistical similarity measurement with the other latent representations of the resource data already stored in the ontological database by means of a cosine similarity algorithm.

3. A method according to claim 2, wherein the predetermined threshold value of the statistical similarity measurement by means of cosine similarity is about 0.7.

4. A method according to any one of the preceding claims, wherein the recommendation model is a hybrid model which uses the methodology of a content-based model and a collaborative filtering type model.

5. A method according to any one of the preceding claims, wherein the recommendation model is a neural collaborative filtering model.

6. A method according to any one of the preceding claims, wherein the recommendation model is a model based on one or more decision trees.

7. A method according to any one of claims 2 to 6, wherein in step cl2) a dense matrix is calculated from the weighted matrix (Mp), so that said dense matrix has, for each resource datum, a dense vector. 8. A computing device-executable computer program, comprising code modules for implementing steps c) to e) of the method according to any one of claims 1 to 7.

9. A computer product in which a computer program according to claim 8 is stored, adapted to be executed by a computing device to implement steps c) to e) of the method according to any one of claims 1 to 7.

Description:

"COMPUTER-IMPLEMENTED METHOD FOR PROVIDING A

RECOMMENDATION TO A TEACHER USER FOR CREATING A

PERSONALIZED TEACHING COURSE" DESCRIPTION

Field of application

[0001] The present invention relates to a computer- implemented method for providing a recommendation to a teacher user for creating a personalized teaching course by selecting and integrating heterogeneous teaching resources.

Background art

[0002] Recommendation systems attempt to predict elements (e.g., movies, music, books, news, web pages, ...) which a user might be interested in, based on some information on their profile and/or data stored in a database. How much an element interests a user is referred to as relevance. Therefore, a recommendation system estimates the relevance of a user based on a collection of elements. Such information is particularly useful in situations where the number of entries is constantly growing (for example, the huge amount of content available on the web) and the amount of relevant data is relatively low.

[0003] These systems advise the user based on a model of the user's profile and the data available.

[0004] Therefore, in the context of movie recommendation systems, if the collection of items consists of movies and the user to advise is a child, the system will propose a list of cartoons because the child's profile somehow corresponds to cartoons and does not correspond to horror or action movies.

[0005] Recommendation systems have become an important area of research since the mid-1990s.

[0006] Much work has been done in the last decade, both in the field and in academia, to develop new approaches to recommendation systems and new fields of application.

[0007] In particular, the problem of generating accurate recommendations for a set of users is present in different application domains, for example e-commerce and multimedia content platforms. Such a problem is addressed according to the data available and also takes into account the complexity of space and time available to meet a recommendation request. Therefore, it is always preferable to use a system which requires as little time and space as possible to perform accurate recommendations .

[0008] Disadvantageously, in most of the existing systems the source of the materials to be used is neglected, reducing confidence in the systems and thus limiting the accuracy thereof.

[0009] In some particular fields, such as that of e- learning platforms or for creating educational courses, generic open datasets not related to the e-learning domain (e.g., MovieLens, Goodbooks) are used, typically known and used for the validation of recommendation systems, or proprietary datasets are used, i.e., a preexisting knowledge base, which is however often incomplete or unsuitable and not possessing all the requirements, fields or metadata required to obtain an effective and efficient recommendation system.

[00010] The use of open datasets is typically used in the academic field, and allows validating the general performance of a recommendation system or Al algorithms, regardless of the specific domain. Although this methodology is valid for generally testing the goodness of the recommendations, in particular for systems based on neural networks, it is not realistically and completely effective in a real, highly complex context in continuous evolution, such as that of e-learning, where an information structure provided with highly significant educational content responsive to the semantics used by teachers, i.e., by the users of the system, is required.

[00011] The second approach, i.e., that based on proprietary datasets, is used in real contexts, however it is affected by the general incompleteness and inadequacy of proprietary datasets, in fact many of the proposed systems integrate processes requiring a significant cognitive effort by users, researchers and developers to overcome the lack of content not present in the datasets. Solution of the invention

[00012] In the field of recommendation systems for teaching courses, the need is thus strongly felt to provide a recommendation method capable of overcoming the typical drawbacks of the prior art, and in particular which is capable of recommending heterogeneous resources not previously categorized.

[00013] According to a further aspect, the need is felt for a recommendation method which is capable of recommending resources never seen before to a given user as a function of the past interaction between user and resources, using the evaluations provided to each resource.

[00014] Such needs are met by a computer-implemented method for providing a recommendation to a teacher user for creating a personalized teaching course in accordance with the appended independent claim 1. The dependent claims describe preferred or advantageous embodiments of the invention, involving further advantageous aspects. Description of the drawings

[00015] The features and advantages of the method according to the present invention will become apparent in any case from the following description of some preferred embodiments thereof, given by way of nonlimiting, indicative example, with reference to the accompanying drawings, in which: figure 1 shows a block diagram of the computer- implemented method for providing a recommendation to a teacher user, in accordance with an embodiment of the present invention, with particular reference to steps a) to e) of the method;

- figure la shows a block diagram of an ontological database, in accordance with an embodiment of the present invention;

- figure 2 shows an extract of an example of a TF-IDF coefficient matrix, in accordance with an embodiment of the present invention; - figure 3 shows an output example of a step cl2) of the method in accordance with an embodiment of the present invention in which, by way of example, the first five resources and a new associated vector representation are shown;

- figure 4 shows an output example of a step cl3) of the method in accordance with an embodiment in which, by way of example, five resources which have been classified using cosine similarity are shown.

Detailed description

[00016] The computer-implemented method according to the present invention is adapted to provide a recommendation to a teacher user for creating a personalized teaching course by selecting and integrating teaching resources, generally heterogeneous teaching resources, such as presentations, documents, web resources, and the like.

[00017] The reference numerals below refer to the aforesaid accompanying drawings.

[00018] The computer-implemented method according to the present invention comprises the following steps: [00019] a) in an electronic storage device, for example a memory of a computer, storing an ontological database 10 comprising at least the following stored data: [00020] - category data 11, where each category datum is an identifier of a specific course or subject category or a vector of course or subject categories;

[00021] resource data, where each resource datum of said resource data comprises notions, such as a link to a document or a presentation or the document or presentation, and a resource category datum. The resource category datum is an identifier of the related category datum with which the resource datum is associated or is an identifier of a failure to associate a resource category datum with the associated resource. For example, the resource category datum is a category code related to the math resources category for the first year of college, or medical resources for the fifth year of school, and so on.

[00022] The ontological database further comprises at least the following stored data:

[00023] a teacher identification code 12, which identifies the teacher user who creates or interacts with the personalized teaching course;

[00024] - a set of teacher user metadata 13 for each teacher user, comprising one or more of the following data: age, anonymous identifier, subject area, number of courses in which he/she teaches, number of proprietary resources, evaluations provided;

[00025] - a set of student user metadata for each student user comprising one or more of the following data: age, anonymous identifier, resource identifiers with which the student user interacted, course year, schooling level, evaluations provided, average grades. [00026] Furthermore, for each of such resource data, the ontological database comprises at least the following stored data:

[00027] an evaluation index, identifying an evaluation level attributed to each of said one or more resources and/or a number of views of each of said one or more resources;

[00028] a collection of resource metadata comprising one or more of the following data: representative keywords of the resource, description, language, duration, recommended age, owner.

[00029] The computer-implemented method according to the present invention further comprises the following steps:

[00030] b) storing or providing a new resource datum in the electronic storage device 20, where the resource category datum associated with said new resource datum identifies a lack of association of a resource category datum, i.e., it does not have a belonging category of the resource datum to a category datum;

[00031] c) performing the following steps on an electronic processor:

[00032] cl) processing the resource metadata set of the new resource datum and performing a statistical similarity measurement 21 between the resource metadata set of the new resource datum and all the resource metadata sets of the remaining stored resource data or a subset of the remaining stored resource data in the electronic storage device;

[00033] c2) associating and storing in the new resource datum 22 the category datum associated with the resource datum or resource data which exceed a predetermined threshold value of the statistical similarity measurement calculated in step cl);

[00034] d) providing a recommendation model;

[00035] e) applying said recommendation model to said ontological database 23, also comprising the new resource datum, to identify and propose to the teacher user one or more resource data based on the teacher user metadata set, the student user metadata set, the evaluation index and the resource metadata set.

[00036] Preferably, in step d), the recommendation model is a hybrid model which uses the methodology of a content-based model and a collaborative filtering type model, known in the field. In particular, for example, the recommendation model is based on a deep model, such as the neural collaborative filtering model or a model based on one or more decision trees.

[00037] In accordance with an embodiment, step cl) comprises the following sub-steps of:

[00038] Cll) processing the resource metadata set of the new resource datum to construct a weighted matrix Mp by means of a TF-IDF coefficient 24, where said weighted matrix Mp has n rows equal to the number of processed resource data and m columns equal to the number of all the metadata of the resource metadata set (in other words, the cell Mp in position ij contains the TF-IDF value for the resource datum i and the metadata j of the resource metadata set),

[00039] C12) using the Singular Value Decomposition algorithm 25 to construct latent representations for each stored resource datum and for the new resource datum which is to be inserted; preferably, such a representation corresponds to a vector of a smaller size with respect to the initial rows of the matrix Mp;

[00040] C13) identifying the latent representation of the new resource datum as the identification profile thereof and calculating the statistical similarity measurement with the other latent representations of the resource data already stored in the ontological database by means of a cosine similarity algorithm.

[00041] Preferably, the predetermined threshold value of the statistical similarity measurement by means of cosine similarity is about 0.7.

[00042] In accordance with an embodiment, step ell) is implemented by a sequence of steps in which a program code example is shown below: def compute_vectorizzation_tfidf (metadata): unique_feature_list, tfidf_feature_list = [], [] for feature in metadata: print (feature) unique_feature tfidf_feature, df_tf_idf = tfidf (feature) unique_feature_list .append (np.sort(unique_feature)) tfidf_feature_list .append (tfidf_feature) print (df_tf_idf.head ()) print ( "Saving {} ".format(feature)) with open ('unique_{}.pickle format(feature) 'wb') as handle : pickle,dump (unique_feature _r handle) with open (’tfidf_{}.pickle '.format(feature), 'wb') as handle : pickle.dump (tfidf_feature, handle) with open ('df_tf_idf_{}.pickle '.format(feature), 'wb') as handle: pickle.dump (df_tf_idf, handle) data = {index: [] for index in range (0, len (tfidf_feature_list[0]))} for i in range (0, len(tfidf_feature_list)): for ii in range (0, len(tfidf_feature_list[i])):

[00043] In the aforesaid program code, the compute_vectorization_tfidf (metadata) function preprocesses the metadata and invokes the calculation of the TF-IDF coefficient for each one. In particular, a single metadatum (e.g., one or more keywords) is provided as input, returning a list of the unique values of the metadatum, a list of triads where each triad comprises a metadatum value, an index and a TF-IDF coefficient value, and a dataframe with the TF-IDF coefficient values.

[00044] In other words, the compute_vectorization_tfidf (metadata) function preprocesses the metadata associated with the resources and computes the matrix of TF-IDF weights for each type of metadatum.

[00045] Figure 2 shows an extract of an example of a TF-IDF coefficient matrix, which also indicates the initial size of the TF-IDF matrix for the preprocessing performed on a MERLOT repository. There are 1726 resources (rows) and 15516 values for the metadata. In this case, the resource with ID 4 for the metadatum "max_age" with value 28 has a weight (i.e., coefficient) equal to 3.571339.

[00046] In accordance with an embodiment, step cl2) is implemented by a sequence of steps in which a program code example is shown below: svd = TruncatedSVD (n_components=n_comp) df_total_dummy = df_total svd.fit (df_total_dummy) result = svd.transform (df_total_dummy) print ("Shape matrix after dimensionality reduction: ", result.shape) print ("Head matrix after dimensionality reduction: ", result.head ()).

[00047] In the aforesaid program code, with the TruncatedSVD (n_components=n_comp) function, the calls made to the Singular Value Decomposition algorithm are reported. Given the size of the components, the function computes the dense matrix from the TF-IDF weight matrix. This new matrix has a dense vector for each resource.

[00048] Figure 3 shows an output example of step cl2) in which, by way of example, the first 5 resources and the new associated vector representation are shown. In this case, for visual clarity, the first 10 columns are shown, as can be noted all the values are different from zero. Thus, a dense vector representation is obtained.

[00049] In accordance with an embodiment, step cl3) is implemented by a sequence of steps in which a program code example is shown below: def calcul_similarity (result): total_list_dist = squareform (pdist(result, metric= 'cosine ')) return total_list_dist.

[00050] In the aforesaid program code, the calcul_similarity (result) function performs the cosine similarity calculation between all the classified elements and those which have yet to be classified. All of these elements are contained within the result matrix. The function output is an NxN dimension matrix. Where N indicates the number of resources analyzed.

[00051] In accordance with an embodiment, for the calculation of the similarities the metric is of the

Euclidean type. In alternative but advantageous embodiments, the metric for calculating similarity is of the Mahalanobis or Seuclidean or Canberra type. Preferably, the similarity calculation is performed using a Pyton code library referred to as Scipy.

[00052] Figure 4 shows an output example of step cl3) in which, by way of example, five resources which have been classified using cosine similarity are shown. Using a target resource belonging to the Business category, the similarity with other resources was calculated and only the first two belonging to the Business category exceeded the similarity threshold set at 0.7. Therefore, the first two resources are correctly classified.

[00053] As can be seen from the aforesaid steps of the method, from a) to e), from a methodological point of view, in order to solve the proposed problem it is important that the ontological database has a series of complete fields, related to the metadata of the teaching resources present therein. Precisely, the proposed methodology is based on the completeness of the information structure, so as to respond to certain needs in the field of research and in particular e-learning, i.e., educational needs of potential users (teachers) who must create the courses in the most personalized manner possible, optimizing the use of resources.

[00054] Preferably, the ontological database, of which a block diagram is shown in figure la, is built according to the LOM (Learning Object Metadata) standard of the IEEE, which is a fundamental standard and adapts to e- learning platforms, by creating, inserting and adapting an appropriate ontology, unified with respect to the various information sources of the resources.

[00055] Preferably, the ontology created introduces a hierarchical tree structure which allows precisely classifying the different resources belonging to various disciplinary or educational domains, thus generating a cohesive and coherent knowledge base. It is initially defined manually by the team members and the experts of the various subjects covered. Thereby, resources are collected, classified and integrated, even highly heterogeneous and coming from different information sources, with respect to the different metadata and features, within a single complete information structure.

[00056] Preferably, the ontological database is then expanded in two different manners:

[00057] with an iterative ETL (Extract, Transform, Load) process, taking the resources recently added in the initial sources or adding new sources to the initial set; in the latter case, in addition to the resources, the ontology of the categories to which they belong is extracted as well;

[00058] or with the manual addition of a resource by the teacher.

In accordance with an embodiment, the computer- implemented method also comprises a series of steps for reconstructing and storing the resource metadata set for each resource.

For example, in accordance with an embodiment, a new resource datum is classified using the different metadata of the ontology, managing them appropriately and thus enriching the knowledge base. If there are missing metadata, they are extrapolated by means of different techniques, used and applied with respect to the structure of the resource. In particular, Natural Language Processing (NLP) and Machine Learning (ML) algorithms are used to manage the various cases.

[00059] In accordance with an embodiment, in the case of a textual document in which the keywords are missing, an Textrank algorithm 26 is used, which returns the most significant words; this algorithm is based on the concept of graph-based ranking, i.e., each vertex represents a lexical unit, and the arcs are the links between these units.

[00060] In this example, the algorithm starts from a descriptive text and generates a graphical structure which allows identifying the most significant lexical units. Single keywords or multi keywords can be worked with.

[00061] In accordance with an embodiment, in the case of video-type teaching resources, neural networks are used for the transformation from video to textual format 27, so as to obtain the keywords required for the correct classification of the resource datum; initially the video is taken and the transformation from video to audio is performed through the Python library referred to as Moviepy. The Speech to text API, for example from Google, is then used to obtain the text from the video recordings 28 which will be input to the text rank algorithm.

[00062] With reference to the inclusion of a new resource datum in the ontological database, it is apparent that if the resource has the category datum among the metadata and such a value is already present in the ontological database as a category datum, then a direct association and storage 29 is carried out. [00063] If the new resource belongs to a new source, provided with ontology, a matching is performed between the ontological database and the new source from which the resource comes. Such merging is performed using the SAMBO or TF-IDF algorithm 30, proposed in various works of the prior art, which allows unifying again the categorization of resources in a single acyclic oriented graph (DAG).

[00064] According to an aspect of the present invention, the ontological database created by the method according to the present invention is also effectively used to identify categories aimed at creating or training intelligent models on specific disciplinary domains, to support teachers and experts in the creation of specialized training courses, or even on more general and less circumscribed domains. Furthermore, depending on the type of task or algorithm to be formalized and addressed, a subset of the knowledge base and ontology can be used.

[00065] Innovatively, the method according to the present invention brilliantly overcomes the typical problems of the prior art.

[00066] In particular, by virtue of the automatic categorization mechanism of the new resource data which ensures a correct integration between proprietary and open datasets, it is possible to obtain a complete personalized ontological database, i.e., complete data of all the necessary metadata and heterogeneous educational resources in each format. Furthermore, the method according to the present invention makes it possible to improve research in the field of e-learning since the completeness with which it is provided allows validating, for example, various intelligent systems and in particular recommendation systems.

[00067] Furthermore, the method according to the present invention reduces the efforts required to achieve an adequate classification of the information content, since the classification of poly-thematic resources becomes almost immediate, and a greater efficiency is achieved in the classification of large amounts of information.

[00068] Furthermore, with the method according to the present invention a greater amount of resources are made available to the recommendation model properly categorized. As a result, the recommendation model is more effective in proposing the correct resources for the creator user of a course.

[00069] It is apparent that, in order to meet contingent needs, those skilled in the art can make changes to the invention, all contained within the scope of protection as defined by the following claims.

Previous Patent: 3D PRINTED BIOACTIVE SCAFFOLDS

Next Patent: HEAT EXCHANGER