U-Learn

Finding optimal coding units from unsegmented sequential databases (e.g., finding the words from a continuous speech stream) can be solved by looking for the unit boundaries, often defined as the point where the predictability of the next element of the sequence is the lowest. However, other models rely on a very different strategy, in which chunks are built progressively by the concatenation of the initial primitives, then selected through some kind of competition process between different segmentation modes.

U-Learn is a Windows-based, user friendly software, which implements two representative chunk-based models: The MDLChunker (Robinet, Lemaire, & Gordon, 2011), which relies on a Minimum Description Length method such as used in standard compression algorithms, and PARSER (Perruchet & Vinter, 1998), which relies on basic principles of associative learning and memory. U-Learn allows to generate corpora from a list of items and the desired frequency for each item, with a large number of options, or alternatively, to start from an existing database (such as a child-directed language). There are two running modes. The step by step mode is set up for maximum transparency in terms of access to all the operations performed by the models on a single run, while the 'normal' mode allows efficiently performing and analyzing simulations over several runs. Due to its modular design, U-Learn may be easily complemented with other models or some variants of the initial models.

To use U-learn, click here, and download the UserManual, U-Learn.exe and at least one of the two specific modules: MDLCh.exe and Parser.exe.