#list_of_text_mining_methods
List of text mining methods
Different text mining methods are used based on their suitability for a data set. Text mining is the process of extracting data from unstructured text and finding patterns or relations. Below is a list of text mining methodologies.Centroid-based Clustering: Unsupervised learning method. Clusters are determined based on data points. Fast Global KMeans: Made to accelerate Global KMeans. Global-K Means: Global K-means is an algorithm that begins with one cluster, and then divides in to multiple clusters based on the number required. KMeans: An algorithm that requires two parameters 1. K 2. Set of data. FW-KMeans: Used with vector space model. Uses the methodology of weight to decrease noise. Two-Level-KMeans: Regular KMeans algorithm takes place first. Clusters are then selected for subdivision into subclasses if they do not reach the threshold. Cluster Algorithm Hierarchical Clustering Agglomerative Clustering: Bottom-up approach. Each cluster is small and then aggregates together to form larger clusters. Divisive Clustering: Top-down approach. Large clusters are split into smaller clusters. Density-based Clustering: A structure is determined by the density of data points. DBSCAN Distribution-based Clustering: Clusters are formed based on mathematical methods from data. Expectation-maximization algorithm Collocation Stemming Algorithm Truncating Methods: Removing the suffix or prefix of a word. Lovins Stemmer: Removes longest suffix. Porters Stemmer: Allows programmers to stem words based on their own criteria. Statistical Methods: Statistical procedure is involved and typically results in affixes being removed. N-Gram Stemmer: A set of 'n' characters that are consecutive taken from a word Hidden Markov Model (HMM) Stemmer: Moves between states are based on probability functions. Yet Another Suffix Stripper (YASS) Stemmer: Hierarchal approach in creating clusters. Clusters are then considered a set of elements in classes and their centroids are the stems. Inflectional & Derivational Methods Krovetz Stemmer: Changes words to word stems that are valid English words. Xerox Stemmer: Removes prefixes. Term Frequency Term Frequency Inverse Document Frequency Topic Modeling Latent Semantic Analysis (LSA) Latent Dirichlet Allocation (LDA) Non-Negative Matrix Factorization (NMF) Bidirectional Encoder Representations from Transformers (BERT) Wordscores: First estimates scores on word types based on a reference text. Then applies wordscores to a text that is not a reference text to get a document score. Lastly, documents that are not referenced are rescaled to then compare to the reference text.
Sun 15th
Provided by Wikipedia
This keyword could refer to multiple things. Here are some suggestions: