Anagram finder in Rust – Lars Nordenström's blog

This is the Rust implementation of the anagram program. The problem and the solution is described in the main anagram article. Do read that article first.

Also, you may find the C++ article helpful, since it describes member data and functions in some detail, not repeated here. This article describes only the differences between the C++ implementation and this implementation.

The Rust implementation is similar, almost identical, to the other implementations; the differences are mostly syntactical.

It is assumed that the input is plain ASCII, or the sorting, which works on the raw bytes, will likely produce garbage.

There are two unsafe blocks for handling the global variable comparisons. Since there is only a single thread, the code is not actually unsafe, since there is no race condition.

There is another unsafe block where I use as_mut_vec() on a String. This is not a problem as long as the assumption that the input is ASCII is valid. A safe, but slower, alternative is to extract the characters as a Vec<> of characters.

The source code

#![allow(dead_code)]
#![allow(unused_must_use)]
#![allow(unused_parens)]
#![allow(non_upper_case_globals)]

use std::io::Read;
use std::cmp::Ordering;
use std::env;

const      MINIMUM     : usize = 3;
static mut comparisons : usize = 0;

//---------------------------------------------------------------------------------------- Word
#[derive(Debug,Clone,Eq)]
pub struct Word {
    offset  : usize,
    value   : String,
    unified : String,
}

impl Word {
    pub fn new(offset  : usize, value : &String, unified : String) -> Word {
        Word { offset : offset, value : value.clone(), unified : unified }
    }

    pub fn print(&self) { println!("{:4} {} {}", self.offset, self.value, self.unified); }
}


impl Ord for Word {
    fn cmp(&self, other: &Self) -> Ordering {
        let mut rval = self.unified.cmp(&other.unified);
        if (rval == Ordering::Equal) {
            rval = self.offset.cmp(&other.offset);
        }
        unsafe { comparisons += 1; }
        return rval;
    }
}

impl PartialOrd for Word {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

impl PartialEq for Word {
    fn eq(&self, other: &Self) -> bool {
        return (self.unified == other.unified) && (self.offset == other.offset);
    }
}


//---------------------------------------------------------------------------------------- AnagramFinder
pub struct AnagramFinder {
    len        : usize,
    words      : Vec<Box<Word>>,
    accepted   : Vec<Word>,
    low_offset : usize
}

impl AnagramFinder {
    fn get_unified(s : &String) -> String {
        let mut s2 : String = s.clone();
        unsafe {
            s2.as_mut_vec().sort();
        }
        return s2;
    }

    fn new( text : &String, len : usize) -> AnagramFinder {
        let mut this = AnagramFinder {
            len        : len,
            words      : Vec::with_capacity(text.len()-len+1),
            accepted   : Vec::with_capacity(20),
            low_offset : 0
        };

        for n in (0..(text.len()-len+1)) {
            let b : String = text[n..(n+len)].to_string();
            this.words.push(Box::new(Word::new(n, &b, AnagramFinder::get_unified(&b))));
        }
        return this;
    }

    fn print(&self) {
        for (index, ref value) in self.words.iter().enumerate() {
            print!("Word {:2} = ", index);
            value.print();
        }
    }

    fn print_one_result(&self) {
        print!("{:3}:", self.accepted.len());
        for it in &self.accepted {
            print!(" {:4}:{}", it.offset, it.value);
        }
        println!();
    }

    fn add_candidate(&mut self, cand : &Word) {
        if (cand.offset < self.low_offset) {
            return;
        }
        for it in &self.accepted {
            if (it.value == cand.value) {
                return;
            }
        }
        self.accepted.push(Word::new(cand.offset, &cand.value, cand.unified.clone()));
        self.low_offset = cand.offset + self.len;
    }

    fn add_range(&mut self, begin : usize, end : usize) {
        if (end-begin >= MINIMUM) {
            self.accepted.clear();
            self.low_offset = 0;
            for n in (begin..end) {
                let w : Box<Word> = self.words[n].clone();
                self.add_candidate(&w);
            }
            if (self.accepted.len() >= MINIMUM) {
                self.print_one_result();
            }
        }
    }

    fn find_all(&mut self) {
        self.words.sort();

        let mut begin : usize = 0;
        let nwords : usize = self.words.len();
        for n in (1..nwords) {
            if (self.words[begin].unified != self.words[n].unified) {
                self.add_range(begin, n);
                begin = n;
            }
        }
        self.add_range(begin, nwords);
    }
}

//---------------------------------------------------------------------------------------- main()
fn main() {
    let     argv : Vec<String> = env::args().collect();
    let     argc : usize       = argv.len();
    let mut from : usize       =    4;    if (argc > 1) { from = argv[1].parse().unwrap(); }
    let mut to   : usize       = from;    if (argc > 2) { to   = argv[2].parse().unwrap(); }
    let mut text : String      = String::new();

    std::io::stdin().read_to_string(&mut text);
    text.retain(|ch| ch.is_alphabetic());
    text = text.to_uppercase();

    for len in (from..(to+1)) {
        let mut af : AnagramFinder = AnagramFinder::new(&text, len);
        if (len > from) {
            println!();
        }
        println!("len = {}", len);
        af.find_all();
    }
    unsafe { println!("string length = {:5}, comparisons = {}", text.len(), comparisons); }
}

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.

The source code

Cancel reply