About Factotum

Factotum is a simple research tool that allows users to search for text material quickly and effectively. Unlike traditional search engines that allow searches to be initiated by simple queries, Factotum performs searches that match entire bodies of text against even larger bodies, ideally including similar texts and potentially source material for the query texts.

For example, a user may submit an entire novel in a digital format, and find other texts with matching phrases. This may be used to track citation (legitimate or not), influences and plagiarism. If the user is willing and able to release the query text, it will then become a candidate for future matches, increasing their coverage organically.

Research context

Factotum is a research tool at present, though its research supports two distinct fronts.

Sources for the Speculum Morale

The aim of the overall project is to determine, with great confidence and scope, the likely sources for elements of the Speculum Morale. As this type of research generally requires considerable human effort, any machine assistance is of practical value, and for this reason Factotum has been prototyped. It is intended that Factotum be useful as an apparatus to other projects as well, as there is no necessity to specialise it to the Speculum Morale or even the Latin language.

Techniques for advanced phrase similarity search

Though exact text matching is a very mature science, inexact text matching is far more challenging. Simple permutations and edits may be represented with empirical metrics, such as various definitions of edit distance.

However, during Factotum's later stages of development, it is intended that attempts be made to support far looser definitions of similarity, such as similarity in conceptual structure that is tolerant of higher-level edits such as substitution with related terms and phrases.

For example, it is reasonable to declare that one short story is a derivative of another if their events are nearly identical but one is written about a tiger and another about a wolf. This may be obvious to a human, but computed natural language processing is a new field and an effective and scalable system will likely be very difficult to produce.

Team

The overall project has four direct contributors, and as such, all four have a large part in Factotum's development. Roles are described in terms of Factotum contributions.

Dmitri Nikulin - Freelance software engineer

Sole Factotum software architect and developer.

Professor Constant Mews - Monash University, Arts Faculty

Chief Investigator, Arts facet. Active Factotum user and design contributor.

Doctor David Squire - Monash University, Information Technology Faculty

Chief Investigator, IT facet. Active Factotum design contributor.

Doctor Tomas Zahora - Monash University, Library Learning Skills Unit

Primary Factotum user and design contributor.

Research funding

All funding to date has come from the Australian Research Council, as part of a project titled Ethics and encyclopaedic culture in 13th-century France: adaptation, diffusion and contexts of innovation in the Speculum morale and its sources. The project is listed in the ARC Network for Early European Research's 2010 funding success stories. This funding was secured by Professor Constant Mews and Doctor David Squire, and is administered by Monash University.