§ 04
FAQ

Frequently asked
questions.

Common questions about MaTRiC — what it is, how it was built, how to access it, how to cite it, and what kinds of analysis you can conduct with it.

01

What is MaTRiC?

+

The Malaysia Tourism Review Corpus (MaTRiC) is a validated corpus of online tourist reviews on Malaysian tourism destinations. It was developed as part of the research project entitled A Discourse-based Framework of Tourists and Service Providers' Cross-cultural Understanding towards Tourist Destinations in Malaysia. The corpus was built from TripAdvisor reviews covering three major tourism categories: accommodation, activities, and food. It consists of 14 country-based subcorpora, contains approximately 45.3 million words, and covers an eleven-year period, from 1 January 2012 to 31 December 2022.

02

How was MaTRiC built?

+

MaTRiC was built through a combination of computational and manual procedures. Reviews were first collected using automated web-crawling techniques, which enabled the research team to gather large-scale online review data from TripAdvisor. The research team then manually cleaned, organised, and validated the collected data by checking for and addressing issues such as duplicated reviews, incomplete reviews, irrelevant reviews, and gibberish. This process helped ensure that the corpus was ready for research use.

03

How do I access MaTRiC?

+

MaTRiC is made available free of charge for research and educational purposes. Users are expected to acknowledge the corpus when using it in presentations, theses, dissertations, reports, journal articles, conference papers, books, book chapters, or other research outputs. You can access MaTRiC through the Corpus Data page of this website. Each country-based subcorpus is listed separately and can be downloaded through the link provided. You must sign in with a free account before downloading.

04

What is the difference between the subcorpora?

+

Each subcorpus represents reviews written by tourists from a specific country. For example, the United Kingdom subcorpus contains reviews written by tourists from the United Kingdom, while the China subcorpus contains reviews written by tourists from China. The subcorpora differ mainly in terms of country group and corpus size.

05

How should I acknowledge MaTRiC?

+

Please use the following acknowledgement when using the corpus:

The data were obtained from the Malaysia Tourism Review Corpus (MaTRiC). Available at: [website link]. All rights in the corpus data are reserved.

06

How can I analyse MaTRiC?

+

MaTRiC can be analysed using corpus and text analysis software such as AntConc, Wmatrix, WordSmith Tools, #LancsBox X, and Sketch Engine, among others. After downloading the relevant subcorpus, users can import the text files into their chosen software for analysis. You can also use the Search Online feature of this website to run KWIC and Passage searches directly.

07

What kinds of analysis can I conduct with MaTRiC?

+

Researchers may use MaTRiC for different types of corpus-assisted analysis. Common analytical techniques include frequency analysis (identifying the most frequently used words or expressions), n-gram analysis (examining repeated word sequences), keyword analysis (identifying words that are statistically overused or underused in one subcorpus compared with another corpus), key semantic domain analysis (identifying meaning areas that are unusually prominent in a corpus), collocation analysis (examining words that frequently occur together), and concordance analysis (examining words or phrases in their surrounding context).

08

Can I use only one subcorpus?

+

Yes. Researchers may use the full MaTRiC or select one or more country-based subcorpora depending on their research aims. For example, a researcher may focus only on the China subcorpus, compare the United States and China subcorpora, or examine broader patterns across all available subcorpora.

09

Can I redistribute MaTRiC?

+

No. MaTRiC is provided for research and educational use. Users should not redistribute, republish, sell, or upload the corpus data to another public platform. Other researchers should be directed to this website to access the corpus. If you have further questions about the corpus, please contact Associate Professor Dr. Ali Jalalian Daghigh or Associate Professor Dr. Sheena Kaur through the contact information provided on this website.

Still have a question?

Contact the project leads directly.