Sound Research WIKINDX

WIKINDX Resources

Liu, H. (2019). Automatic idea generation and analysis using NLP and ML techniques. Unpublished PhD thesis, University of Nottingham, United Kingdom. 
Added by: Mark Grimshaw-Aagaard (8/1/22, 11:15 AM)   
Resource type: Thesis/Dissertation
BibTeX citation key: Liu2019
View all bibliographic details
Categories: AI/Machine Learning
Keywords: Creativity
Creators: Brailsford, Goulding, Houghton, Liu, Maul
Publisher: University of Nottingham (United Kingdom)
Views: 3/3
"Ideas are the fundamental way in which information is conveyed in written text.
This research investigates the discovery and extraction of ideas from corpuses of
scientiic literature. There are several elements to this work: (1) the functional
definition of ideas; (2) the computation of novel ideas; (3) the representation of
ideas; (4) the construction of a ground truth dataset; and (5) the use of citations
as an idea container.

Ideas are defined as a <problem, solution> pair, where the problem and solution
are represented by noun phrases, or a sequence of words. As a result of this, the
task of idea detection is broken down to problem and solution extraction. The
task of idea extraction is similar to Named Entity Recognition (NER), where the
problems and solutions may be seen as special entities. These techniques worked
well although the results contained a lot of noise that need to be removed.

Automatic idea generation was conducted using a dataset from the Journal of
Science. Old ideas were defined as the existing <problem, solution> pairs in
the same abstract and new ideas were generated by predicting new links between
problems and solutions that do not occur together in one abstract. Evaluation
was performed using metrics that are widely used in information retrieval. The
F1 scores (higher than 0.90) provides good evidence that the proposed method is
capable of generating useful ideas.

A ground truth data set that contained <problem, solution> pairs was constructed
from the publications of the International Conference on Neural Information Processing
Systems and the Journal of Machine Learning Research. This data was
annotated by human volunteers, and it was used for training idea detection models
using Conditional Random Field (CRF) and Long-short Term Memory (LSTM).
To evaluate the performance of the models, the precision and recall were computed.

Idea analysis was studied by analyzing citations, which are considered to be containers
for ideas. Word vectors were used to represent the citations for the purpose
of classifying citation sentiment, and a method was developed to measure
the sequence of citation sentiment. This method for analyzing internal citation
sentiment sequence worked well (with F1 measure 0.86)."

WIKINDX 6.5.0 | Total resources: 1152 | Username: -- | Bibliography: WIKINDX Master Bibliography | Style: American Psychological Association (APA)