My Research Interests
Applying my research to the real world has been and will always be an important goal in my career. Moreover, I love interdisciplinary work since it broadens my sphere of knowledge. Fortunately during my post graduation degree and my professional working I have had the opportunity to pursue research on various projects on direct application in Information Extraction, Information Retrieval, data mining, text mining and NLP. Throughout the last six years, I started to develop skills in the data mining subfields, ranging from theoretical studies like algorithms for data mining, machine learning issues like feature selection and discovery and SVM, applications in Text mining, Information Retrieval, and Information Extraction. My courses and publications focused on Text mining and Information Retrieval, and I gained experiences on the above through my post graduation desertion and professional research experiences. I also published three papers at International conferences. I want to pursue research problems that build upon my past experience and education, but not necessarily limited to those past subjects. A healthy research career requires the ability to adapt to changing problems and tools, while building upon a solid foundation. I was able to expand to new areas in the past, either through application of my analytical tools to a new domain, or by taking on a new problem within a domain that I am already familiar with. Besides, the current Information Retrieval, Text mining, Semantic Reasoning and Knowledge Discovery updating tools, techniques and frameworks are still far from satisfactory. Therefore, I would like to explore any one of the following problems using my already cumulated knowledge on data mining, Machine learning and Information retrieval. I have mentioned some of my research interests as follows.
Text Classification
Today's organizations face a vast volume of knowledge and information. Most of
the explicit knowledge is stored in different types of documents but only a few people
(often only the authors of the documents) know where to locate them. A major approach
for organizing information is to classify collected information according to a pre-defined
set of classes and to retrieve relevant information by browsing the list of classes used.
The enormous increase in the amount of digital information or resource available and the
demand for retrieval tools to manage the information overload have lead to an interest in
automatic classification task with the expectation of reducing human labor to a significant
extent or even replacing in a limited portion. The objective of document classification is
to reduce the detail and diversity of data and the resulting information overload by
grouping similar documents together
An effective method for document is to explore content-based classification, which
classifies documents based on its contents. Such a content-based classification method
proceeds as follows: First, keywords and terms can be extracted by using some
information retrieval and simple manual analysis techniques. Second, concept hierarchies
of keywords and terms can be obtained using available term classes, as Word Net, or
relying on expert knowledge, or some keyword classification systems. Documents in the
training set can also be classified into class hierarchies. Some analysis method can then
be applied to discover sets of associated terms that can be used to maximally distinguish
one class of documents from others and used to classify new documents Lot of research
areas are still to be explored in document classification. I have listed out some of them.
- Feature selection for text classification
- Semantic indexing techniques and classification models
- Automatic classification structure(taxonomy) learning for classification
- Multi-class and Multi-Label classification
- Integration of multiple sources for classification
- Classification with background information (E.g., with the help of an Ontology)
- Hierarchical classification
Text Information Retrieval
My experience in text information retrieval focuses on combining multiple resources, evidence, and criteria to incorporate domain knowledge for query expansion and result ranking. The query expansion module improves existing techniques by using several term-weighting schemes to group and combine terms from different sources based on their characteristics, which proves to be more effective than the typical approach of treating expansion terms equally. For result ranking, different scoring criteria are used to evaluate evidence from document, passage, and term-matching granularities, which are further combined to produce a final ranking. The main challenges of this work are, how to incorporate multiple models, effective techniques to query expansion, term-weighting algorithms, how to performing ranking with multiple scoring criteria. I am planning to conduct a detail research in effective combination of information from various resources and aspects in multiple stages of retrieval for a domain-dependent application (Intelligent Resume processor, Law documents search engine, Talent management System, Medical Information Retrieval, Opinion Mining).
Domain-Dependent and Task-Specific Information Access
Specialized search provides high-quality results for domain-dependent and taskspecific information access, and greatly complements general-purpose search. What intrigues me most in specialized search is its potential to incorporate knowledge about domains/tasks to better capture the characteristics of the data and users, which can lead to considerably improved performance. I am interested in working towards general frameworks to incorporate and integrate information from multiple sources such as contents, prior knowledge, and external resources. The frameworks will include more sophisticated techniques than simple forms of combination for information integration, especially for the cases where information is represented in a wide variety of forms, or implicit dependencies exist between different pieces of information. One important lesson I have learned from my previous work is that understanding the characteristics of the domain, task, and data first and developing techniques accordingly is far more important and effective than mechanically applying theoretically sound models without detailed data analysis. I will continue to use this general approach in designing the most appropriate methods for domain-dependent and task-specific solutions. Methods developed for domain-dependent tasks often use specialized knowledge resources. In some cases these resources can be created by text-mining of large or wellorganized corpora. I am mostly interested in mining entity relations that are embedded in unstructured text contents. I plan to conduct research on learning and extracting typical relational patterns between entities in specific domains (e.g., genes, diseases, symptoms, and medicines in biomedical domain) for active knowledge discovery, focusing on semisupervised or unsupervised methods that require few training data. Furthermore, I will work on adapting and improving the techniques developed for simple entities to more complicated ones with multiple attributes or facets, which I believe will benefit many domain-dependent applications. Again, careful data analysis and attention to details goes a long way toward building the best solutions to particular problems.
Text mining techniques for Business Intelligence and CRM
Predictive Text Analytics enables one to make true, multi-channel, customer
relationship management (CRM) a reality for the organization. By incorporating text
mining with predictive analysis, one can get detailed models of customer behavior and
preferences that can use throughout their organization. Text mining applications in CRM
is an emerging field and many more models are developed in this field to explore the
hidden knowledge of the potential customers and their views. I have mentioned some of
the important research topics in this field.
- Analyse call center transcripts to identify customer concerns, then tie that
information back to customer actions and segments.
- Predict the offers customers are most likely to accept, increasing up-selling ands
cross selling results whether in person, in the call center, or online.
- Improve customer retention by determining which customer complaints are most
likely to precede definition, and take action to prevent it.
- Discover what drives customer to your customer service call center and identifiy
areas for improvement.
- Identify common customer complaints from online customers by analyzing.
- customer e-mail and instant message transcripts. Use this information to identify
areas of your site that need improvements.