My Research Interests

Applying my research to the real world has been and will always be an important goal in my career. Moreover, I love interdisciplinary work since it broadens my sphere of knowledge. Fortunately during my post graduation degree and my professional working I have had the opportunity to pursue research on various projects on direct application in Information Extraction, Information Retrieval, data mining, text mining and NLP. Throughout the last six years, I started to develop skills in the data mining subfields, ranging from theoretical studies like algorithms for data mining, machine learning issues like feature selection and discovery and SVM, applications in Text mining, Information Retrieval, and Information Extraction. My courses and publications focused on Text mining and Information Retrieval, and I gained experiences on the above through my post graduation desertion and professional research experiences. I also published three papers at International conferences. I want to pursue research problems that build upon my past experience and education, but not necessarily limited to those past subjects. A healthy research career requires the ability to adapt to changing problems and tools, while building upon a solid foundation. I was able to expand to new areas in the past, either through application of my analytical tools to a new domain, or by taking on a new problem within a domain that I am already familiar with. Besides, the current Information Retrieval, Text mining, Semantic Reasoning and Knowledge Discovery updating tools, techniques and frameworks are still far from satisfactory. Therefore, I would like to explore any one of the following problems using my already cumulated knowledge on data mining, Machine learning and Information retrieval. I have mentioned some of my research interests as follows.

Text Classification

Today's organizations face a vast volume of knowledge and information. Most of the explicit knowledge is stored in different types of documents but only a few people (often only the authors of the documents) know where to locate them. A major approach for organizing information is to classify collected information according to a pre-defined set of classes and to retrieve relevant information by browsing the list of classes used. The enormous increase in the amount of digital information or resource available and the demand for retrieval tools to manage the information overload have lead to an interest in automatic classification task with the expectation of reducing human labor to a significant extent or even replacing in a limited portion. The objective of document classification is to reduce the detail and diversity of data and the resulting information overload by grouping similar documents together An effective method for document is to explore content-based classification, which classifies documents based on its contents. Such a content-based classification method proceeds as follows: First, keywords and terms can be extracted by using some information retrieval and simple manual analysis techniques. Second, concept hierarchies of keywords and terms can be obtained using available term classes, as Word Net, or relying on expert knowledge, or some keyword classification systems. Documents in the training set can also be classified into class hierarchies. Some analysis method can then be applied to discover sets of associated terms that can be used to maximally distinguish one class of documents from others and used to classify new documents Lot of research areas are still to be explored in document classification. I have listed out some of them.

- Feature selection for text classification
- Semantic indexing techniques and classification models
- Automatic classification structure(taxonomy) learning for classification
- Multi-class and Multi-Label classification
- Integration of multiple sources for classification
- Classification with background information (E.g., with the help of an Ontology)
- Hierarchical classification

Text Information Retrieval

My experience in text information retrieval focuses on combining multiple resources, evidence, and criteria to incorporate domain knowledge for query expansion and result ranking. The query expansion module improves existing techniques by using several term-weighting schemes to group and combine terms from different sources based on their characteristics, which proves to be more effective than the typical approach of treating expansion terms equally. For result ranking, different scoring criteria are used to evaluate evidence from document, passage, and term-matching granularities, which are further combined to produce a final ranking. The main challenges of this work are, how to incorporate multiple models, effective techniques to query expansion, term-weighting algorithms, how to performing ranking with multiple scoring criteria. I am planning to conduct a detail research in effective combination of information from various resources and aspects in multiple stages of retrieval for a domain-dependent application (Intelligent Resume processor, Law documents search engine, Talent management System, Medical Information Retrieval, Opinion Mining).

Domain-Dependent and Task-Specific Information Access

Specialized search provides high-quality results for domain-dependent and taskspecific information access, and greatly complements general-purpose search. What intrigues me most in specialized search is its potential to incorporate knowledge about domains/tasks to better capture the characteristics of the data and users, which can lead to considerably improved performance. I am interested in working towards general frameworks to incorporate and integrate information from multiple sources such as contents, prior knowledge, and external resources. The frameworks will include more sophisticated techniques than simple forms of combination for information integration, especially for the cases where information is represented in a wide variety of forms, or implicit dependencies exist between different pieces of information. One important lesson I have learned from my previous work is that understanding the characteristics of the domain, task, and data first and developing techniques accordingly is far more important and effective than mechanically applying theoretically sound models without detailed data analysis. I will continue to use this general approach in designing the most appropriate methods for domain-dependent and task-specific solutions. Methods developed for domain-dependent tasks often use specialized knowledge resources. In some cases these resources can be created by text-mining of large or wellorganized corpora. I am mostly interested in mining entity relations that are embedded in unstructured text contents. I plan to conduct research on learning and extracting typical relational patterns between entities in specific domains (e.g., genes, diseases, symptoms, and medicines in biomedical domain) for active knowledge discovery, focusing on semisupervised or unsupervised methods that require few training data. Furthermore, I will work on adapting and improving the techniques developed for simple entities to more complicated ones with multiple attributes or facets, which I believe will benefit many domain-dependent applications. Again, careful data analysis and attention to details goes a long way toward building the best solutions to particular problems.

Text mining techniques for Business Intelligence and CRM

Predictive Text Analytics enables one to make true, multi-channel, customer relationship management (CRM) a reality for the organization. By incorporating text mining with predictive analysis, one can get detailed models of customer behavior and preferences that can use throughout their organization. Text mining applications in CRM is an emerging field and many more models are developed in this field to explore the hidden knowledge of the potential customers and their views. I have mentioned some of the important research topics in this field.

- Analyse call center transcripts to identify customer concerns, then tie that information back to customer actions and segments.
- Predict the offers customers are most likely to accept, increasing up-selling ands cross selling results whether in person, in the call center, or online.
- Improve customer retention by determining which customer complaints are most likely to precede definition, and take action to prevent it.
- Discover what drives customer to your customer service call center and identifiy areas for improvement. - Identify common customer complaints from online customers by analyzing.
- customer e-mail and instant message transcripts. Use this information to identify areas of your site that need improvements.