Stemming, Part 18: Processing and Testing

All the pieces are in place, now here is the final piece. Also, I’ll describe
how I tested this to make sure it was working correctly.

The stem Function

Everything that we’ve written so far happens under the hood. This function is
finally the one function that will be called in other code. Without further
ado, here it is.

(defn stem
  [word]
  (if (<= (count word) 2)
    word
    (apply str (-> word make-stemmer
                   step-1ab step-1c step-2 step-3 step-4 step-5
                   :word))))

If the word has one or two letters, just return it. If it is longer, use the
-> macro to thread the word through make-stemmer and the steps, and
extract the stem vector.

The word vector gets passed to the apply function. This is a special
higher-order function that takes a function and its arguments as a sequence.
It applies the arguments to the function and returns the result. Let’s look
at how it works.

user=> (+ 1 2 3)
6
user=> (apply + '(1 2 3))
6
user=> (apply + 1 '(2 3))
6

You can see that only the last argument to apply has to be a sequence of
arguments to pass to the function. The other arguments can be listed
individually before the final sequence, and they are put before the sequence.
For example, you can’t do this:

user=> (apply + 1 2 3)   
java.lang.IllegalArgumentException: Don't know how to create ISeq from: Integer

Of course, if you’re doing that, you already know how many arguments you’re
calling the function with, and in that case, you should just call it as is
(that is, just call (+ 1 2 3)).

So in stem, we take a word vector and pass all of the characters in it to
the str function. str converts all of this arguments to a string and
concatenates them.

user=> (str w o r d)
"word"
user=> (apply str [w o r d])
"word"

Well, we have a new toy now, so let’s play with it:

porter=> (stem "porter")
"porter"
porter=> (stem "porting")
"port"
porter=> (stem "ports")
"port"
porter=> (stem "ported")
"port"
porter=> (stem "stemming")
"stem"

Testing

I’ve been presenting the code here as a finished product, perfect (I guess) as
written. But it didn’t begin that way. In fact, I originally wrote something
very close to the C version of the algorithm and made sure that worked right.
Then I gradually changed it to make it more lispy. The is the result I have
presented here.

To make sure it worked correctly, I downloaded the test input data
and expected output from the Porter Stemmer web site. The first
file contains 23,531 words for a test set. The second contains those same
words after they’ve been run through the stemmer.

Next, I wrote a function that reads from both files, stems the input, and
compares it to the output. I don’t always need to test every item in the test
set. Sometimes I can get by with only testing the first so many words, so I’ve
included a parameter to limit how many words to test. Also, sometimes I may
want to see the output from every word in the test set, but most of the time,
I really only want to see the errors. Finally, this returns the total number
of words tested, the number the stemmer got right, and the number it got
wrong.

(defn read-lines
  [filename]
  (with-open reader (new java.io.BufferedReader (new java.io.FileReader filename))
    (doall (line-seq reader))))

(defn test-porter
  ([]
   (test-porter (.MAX_VALUE Integer) false))
  ([n output-all?]
   (loop [input (take n (read-lines "porter-test/voc.txt")),
          expected (take n (read-lines "porter-test/output.txt")),
          total 0, correct 0, error 0]
     (if (and input expected)
       (let [i (first input), e (first expected), a (stem i)]
         (if (= a e)
           (do
             (when output-all?
               (println "OK:" (pr-str i)))
             (recur (rest input)
                    (rest expected)
                    (inc total)
                    (inc correct)
                    error))
           (do
             (println "ERROR:" (pr-str i)
                      "=> (" (pr-str a) "!=" (pr-str e) ")")
             (recur (rest input)
                    (rest expected)
                    (inc total)
                    correct
                    (inc error)))))
       [total correct error]))))

The highlights of this are:

  • read-lines is a utility that opens a file using a Java BufferedReader
    and assigns that to reader. with-open always calls (. reader close) when
    it exits. line-seq takes a reader and returns a lazy sequence on the lines
    in the reader, and doall forces Clojure to read all the items in a lazy
    sequence. Basically, read-lines reads all the lines in a file and returns
    them in a sequence.

  • As we’ve seen before, take pulls the first n items from a list, which
    limits the number of words to be tested.

  • The loop continues while there is input from input and expected.

  • The input is stemmed and stored as the variable a (short for actual).

  • If the actual is the same as the expected, optionally output that, and loop,
    incrementing the number of total words tested and the number of words stemmed
    correctly.

  • If the actual and expected are not the same, always write this out and loop,
    incrementing the number of total words tested and the number of errors.

Tomorrow, I’ll talk about how I tracked down bugs that cropped up during
testing.

find the cost of your paper

Sep 13, Grand Remembrances

Today is Grandparents Day in the United States. Being a Grand is a special honor. I feel very blessed that my wife and I have two grandchildren. We were able to visit them today. Yes, we are still being cautious with the coronavirus, but we also find it very difficult to not see them when they live so close. So today we did drop by to visit Jacob (age 10) and Sophia (age 7) along with their parents. We brought donuts and caught up with them. Our grandchildren are still pretty young and this is a precious time in their lives – and ours!

I wish I had known my grandparents better. We never lived in the same place. Dad was a career Air Force pilot, so we moved around a lot. But we did get to see them once in a while when they would visit us, or we them.

A Plague of Giants

There are five known magical ‘kennings’ or types: air, water, fire, earth, and plants. Each nation specializes in of these kennings, and the magic influences the society. There’s a big pitfall with this diversity of ability and locale–not everyone gets along.

Enter the Hathrim giants, or ‘lavaborn’ whose kenning is fire. Where they live the trees that fuel their fire are long gone, but the giants are definitely not welcome anywhere else. They’re big, they’re violent, and they’re ruthless. When a volcano erupts and they are forced to evacuate, they take the opportunity to relocate. They don’t care that it’s in a place where they aren’t wanted.

I first read Kevin Hearne’s Iron Druid books and loved them (also the quirky The Tales of Pell), so was curious about this new venture, starting with A PLAGUE OF GIANTS. Think Avatar: The Last Airbender meets Jim Butcher’s Codex Alera series. Elemental magic, a variety of races, different lands. And it’s all thrown at you from page one.

But this story is told a little differently. It starts at the end of the war, after a difficult victory, and a bard with earth kenning uses his magic to re-tell the story of the war to a city of refugees. And it’s this movement back and forth in time and between key players in this war that we get a singularly grand view of the war as a whole. Hearne uses this method to great effect.

There are so many interesting characters in this book that I can’t cover them all here. Often in books like this such a large cast of ‘main’ character can make the storytelling suffer, especially since they don’t have a lot of interaction with each other for the first 3/4 of the book–but it doesn’t suffer, thankfully. And the characterization is good enough, despite these short bursts, that by the end we understand these people and care about what happens to them.

If there were a main character it would be Dervan, a historian who is assigned to record (also spy on?) the bard’s stories. He finds himself caught up in machinations he feels unfit to survive. Fintan is the bard from another country, who at first is rather mysterious and his true personality is hidden by the stories he tells; it takes a while to understand him. Gorin Mogen is the leader of the Hathrim giants who decide to find a new land to settle. He’s hard to like, but as far as villains go, you understand his motivations and he can be even a little convincing. There’s Abhi, the son of hunters, who decides hunting isn’t the life for him–and unexpectedly finds himself on a quest for the sixth kenning. And Gondel Vedd, a scholar of linguistics who finds himself tasked with finding a way to communicate with a race of giants never seen before (definitely not Hathrim) and stumbles onto a mystery no one could have guessed: there may be a seventh kenning.

There are other characters, but what makes them all interesting is that they’re regular people (well, maybe not Gorin Mogen or the viceroy–he’s a piece of work) who become heroes in their own little ways, whether it’s the teenage girl who isn’t afraid to share vital information, to the scholars who suddenly find how crucial their minds are to the survival of a nation, to the humble public servants who find bravery when they need it most. This is a story of loss, love, redemption, courage, unity, and overcoming despair to not give up. All very human experiences by simple people who do extraordinary things.

Hearne’s worldbuilding is engaging. He doesn’t bottle feed you, at first it feels like drinking from a hydrant, but then you settle in and pick up things along the way. Then he shows you stuff with a punch to the gut. This is no fluffy world with simple magic without price. All the magic has a price, and more often than not it leads you straight to death’s door. For most people just the seeking of the magic will kill you. I particularly enjoyed the scenes with Ahbi and his discovery of the sixth kenning and everything associated with it. But giants? I mean, really? It isn’t bad enough fighting people who can control fire that you have to add that they’re twice the size of normal people? For Hearne if it’s war, the stakes are pretty high, and it gets ugly.

The benefit of the storytelling style is that the book, despite its length, moves along steadily (Hearne is no novice, here). The bits of story lead you along without annoying cliffhangers (mostly), and I never got bored with the switch between characters. It was easy to move between them, and they were recognizable enough that I got lost or confused. The end of the novel felt a little abrupt, but I guess that has more to do with I was ready for the story to continue, despite the exiting climax.

If you’re looking for epic fantasy with fun storytelling and clever worldbuilding, check out A PLAGUE OF GIANTS.

The post A Plague of Giants appeared first on Elitist Book Reviews.

The Artwork Of Gary Choo

Gary Choo is a concept artist/illustrator based in Singapore. I’ve know Gary for a good many years ( 17, actually ), working together in animation studios in Singapore like Silicon Illusions and Lucasfilm. Gary currently runs an art team at Mighty Bear Games, but when time allows he also draws covers for Marvel comics, and they’re amazing –

The Art Of Gary Choo
The Art Of Gary Choo
The Art Of Gary Choo
The Art Of Gary Choo
The Art Of Gary Choo

To see more of Gary’s work or to engage him for freelance work, head down to his ArtStation.

The post The Art Of Gary Choo appeared first on Halcyon Realms – Art Book Reviews – Anime, Manga, Film, Photography.

27