Stemming, Part 14: Verb Endings

In today’s posting, we’ll take a look at the initial step in the Porter
stemmer.

An Overview

The first step in stemming tokens involves removing plural -s and verb
endings -ed and -ing. It also turns -y to -i, so it will be recognized
as a suffix in later steps. The documentation in the C source code lists some
examples:

caresses -> caress
ponies -> poni
ties -> ti
caress -> caress
cats -> cat
feed -> feed
agreed -> agree
disabled -> disable
matting -> mat
mating -> mate
meeting -> meet
milling -> mill
messing -> mess
meetings -> meet

Plurals

The first step is to strip off plurals. If the word ends in a -sses or
-ies, this removes the -es; otherwise, if it ends in -s, but not -ss,
it takes off the -s. If none of those apply, it returns the original
stemmer.

(defn stem-plural
  "This is part of step 1ab. It removes plurals (-s) from a stem."
  [stemmer]
  (if (= (last (:word stemmer)) s)
    (cond-ends? st stemmer
      "sses" (reset-index (pop (pop (:word st))))
      "ies" (set-to st "i")
      :else (if (and (>= (count (:word st)) 2)
                     (not= (nth (:word st) (- (count (:word st)) 2))
                           s))
              (assoc st :word (pop (:word st)))
              st))
    stemmer))

It’s been a while since we’ve seen some of these functions. Here are a few
reminders about what they do:

reset-index returns a new stemmer from a word vector (here with the last two
letters removed from the word).

set-to removes the -ies from the end of the word and replaces it with an
-i.

cond-ends? is the macro we created in the last few postings. I just wanted
to point that out.

Verb Endings

Another part of step 1 involves removing the verb suffixes -ed and -ing
from the word. It first looks for -eed if it is long enough (that is, if it
has more than one internal sequence of consonants), it removes the -d; if it
ends in -ed, it removes that; and if it ends in -ing, it removes that. In
either of the last two cases, it also expands truncated suffixes.

Expanding suffixes just tests for certain endings and, if they are found,
appends an -e to the word. Specifically, it looks for -at, -bl, and
-iz. It also checks for a double-consonant ending. Some are all right in
English (-ll, -ss, and -zz), but most others should be removed. Finally,
it removes the final consonant in words that end in CVC (with exceptions).
This process is handled by the stem-expand-suffix function, which is listed
first.

(defn stem-expand-suffix
  "This is part of step 1ab. It expands -at, -bl,
  and -iz by adding an -e in certain circumstances."
  [stemmer]
  (cond-ends? st stemmer
    "at" (set-to st "ate")
    "bl" (set-to st "ble")
    "iz" (set-to st "ize")
    :else (cond
            (double-c? st (dec (count (:word st))))
              (if (#l s z (last (:word st)))
                st
                (assoc st :word (pop (:word st))))
            (and (= (m st) 1) (cvc? st (dec (count (:word st)))))
              (set-to st "e")
            :else
              st)))

(defn stem-verb-ending
  "This is part of step 1ab. It removes verb endings -ed
  and -ing."
  [stemmer]
  (cond-ends? st stemmer
    "eed" (if (pos? (m st))
            (assoc st :word (pop (:word st)))
            stemmer)
    "ed"  (if (vowel-in-stem? st)
            (stem-expand-suffix (assoc st :word (subword st)))
            stemmer)
    "ing" (if (vowel-in-stem? st)
            (stem-expand-suffix (assoc st :word (subword st)))
            stemmer)))

double-c? returns true if the word ends in a double consonant. For example,
-ll or -ss or something.

cvc? returns true if the word ends in a consonant-vowel-consonant sequence
and if the final consonant is not w, x, or y.

m returns the number of consonant sequences between the start of a word and
the index position. But if the word starts with a consonant sequence, it isn’t
counted.

Step 1AB

The actual function for step 1AB (A calls stem-plural and B calls
stem-verb-ending) is simple. It just passes its input through the two
functions:

(defn step-1ab
  [stemmer]
  (stem-verb-ending (stem-plural stemmer)))

One thing about functional languages is that they often have to be read from
right to left. The stemmer gets passed to stem-plural first and
stem-verb-ending second. The way it is written, however, is
counter-intuitive, and it makes functional languages harder to read.

Clojure provides an improvement to this. It uses the -> macro to build
expressions like above. Let’s spend some time understanding what this macro
does, first by using it and then by looking at the expressions it outputs.

The first parameter to -> is an expression. The remaining parameters are
functions. The expression parameter gets passed as the first parameter to the
first function, and the output of this gets passed as the first parameter to
the second function parameter. This continues until all the functions have
been chained together.

To play with this, let’s define some functions that operate on strings.

porter=> (defn to-lower-case [string] (.toLowerCase string))
#'porter/to-lower-case
porter=> (defn trim [string] (.trim string))
#'porter/trim
porter=> (trim (to-lower-case "   ThIs NeEdS ClEaNiNg  "))
"this needs cleaning"

If we make that last call with ->, it looks like this:

porter=> (-> "   ThIs NeEdS ClEaNiNg   " to-lower-case trim)
"this needs cleaning"

This is really handy if there aren’t any other arguments. But what if you want
to use a function that needs more than one argument. As long as the first
argument is the expression parameter or the result of the previous function in
the sequence, it’s no problem. Just enclose the function and the remaining
parameters to that function in parentheses. For example, this defines a
wrapper for String.substring that returns everything from the second
parameter on.

porter=> (defn substring [string index] (.substring string index))
#'porter/substring
porter=> (-> "   ThIs NeEdS ClEaNiNg   " to-lower-case trim (substring 11))
"cleaning"

We can see what this does using macroexpand-1:

porter=> (macroexpand-1 '(-> "   ThIs NeEdS ClEaNiNg   " to-lower-case trim (substring 11)))
(clojure/-> (clojure/-> "   ThIs NeEdS ClEaNiNg   " to-lower-case) trim (substring 11))

Umm. It changed, but not by much. Let’s just take the inner, second -> and
try expanding it:

porter=> (macroexpand-1 '(-> "   ThIs NeEdS ClEaNiNg   " to-lower-case))
(to-lower-case "   ThIs NeEdS ClEaNiNg   ")

That seems like a normal function call. If we keep breaking it down, we get
this:

(substring (trim (to-lower-case "   ThIs NeEdS ClEaNiNg   ")) 11)

Which is what we want and expect.

Now, we can rewrite step-1ab to use this macro and be much more readable.

(defn step-1ab
  "step-1ab gets rid of plurals and -ed or -ing. E.g.,
    caresses -> caress
    ponies -> poni
    ties -> ti
    caress -> caress
    cats -> cat
    feed -> feed
    agreed -> agree
    disabled -> disable
    matting -> mat
    mating -> mate
    meeting -> meet
    milling -> mill
    messing -> mess
    meetings -> meet
  "
  [stemmer]
  (-> stemmer stem-plural stem-verb-ending))

Step 1C

The rest of step one just tests to see if the word ends in -y. If it does,
and if there is a vowel in the stem, it removes the -y and adds a -i to
the word.

(defn step-1c
  "Turns terminal y to i when there is another vowel in the stem."
  [stemmer]
  (if-ends? st stemmer "y"
            (if (vowel-in-stem? st)
              (reset-index (conj (pop (:word st)) i))
              stemmer)))

That’s pretty straightforward. It uses if-ends? to test for the -y, and
vowel-in-stem? looks for a vowel before the -y. If either of these is not
the case, the original stemmer is returned.

Those two functions comprise step 1. In the next posting, we’ll look at the
next several steps.

find the cost of your paper

Sep 13, Grand Remembrances

Today is Grandparents Day in the United States. Being a Grand is a special honor. I feel very blessed that my wife and I have two grandchildren. We were able to visit them today. Yes, we are still being cautious with the coronavirus, but we also find it very difficult to not see them when they live so close. So today we did drop by to visit Jacob (age 10) and Sophia (age 7) along with their parents. We brought donuts and caught up with them. Our grandchildren are still pretty young and this is a precious time in their lives – and ours!

I wish I had known my grandparents better. We never lived in the same place. Dad was a career Air Force pilot, so we moved around a lot. But we did get to see them once in a while when they would visit us, or we them.

A Plague of Giants

There are five known magical ‘kennings’ or types: air, water, fire, earth, and plants. Each nation specializes in of these kennings, and the magic influences the society. There’s a big pitfall with this diversity of ability and locale–not everyone gets along.

Enter the Hathrim giants, or ‘lavaborn’ whose kenning is fire. Where they live the trees that fuel their fire are long gone, but the giants are definitely not welcome anywhere else. They’re big, they’re violent, and they’re ruthless. When a volcano erupts and they are forced to evacuate, they take the opportunity to relocate. They don’t care that it’s in a place where they aren’t wanted.

I first read Kevin Hearne’s Iron Druid books and loved them (also the quirky The Tales of Pell), so was curious about this new venture, starting with A PLAGUE OF GIANTS. Think Avatar: The Last Airbender meets Jim Butcher’s Codex Alera series. Elemental magic, a variety of races, different lands. And it’s all thrown at you from page one.

But this story is told a little differently. It starts at the end of the war, after a difficult victory, and a bard with earth kenning uses his magic to re-tell the story of the war to a city of refugees. And it’s this movement back and forth in time and between key players in this war that we get a singularly grand view of the war as a whole. Hearne uses this method to great effect.

There are so many interesting characters in this book that I can’t cover them all here. Often in books like this such a large cast of ‘main’ character can make the storytelling suffer, especially since they don’t have a lot of interaction with each other for the first 3/4 of the book–but it doesn’t suffer, thankfully. And the characterization is good enough, despite these short bursts, that by the end we understand these people and care about what happens to them.

If there were a main character it would be Dervan, a historian who is assigned to record (also spy on?) the bard’s stories. He finds himself caught up in machinations he feels unfit to survive. Fintan is the bard from another country, who at first is rather mysterious and his true personality is hidden by the stories he tells; it takes a while to understand him. Gorin Mogen is the leader of the Hathrim giants who decide to find a new land to settle. He’s hard to like, but as far as villains go, you understand his motivations and he can be even a little convincing. There’s Abhi, the son of hunters, who decides hunting isn’t the life for him–and unexpectedly finds himself on a quest for the sixth kenning. And Gondel Vedd, a scholar of linguistics who finds himself tasked with finding a way to communicate with a race of giants never seen before (definitely not Hathrim) and stumbles onto a mystery no one could have guessed: there may be a seventh kenning.

There are other characters, but what makes them all interesting is that they’re regular people (well, maybe not Gorin Mogen or the viceroy–he’s a piece of work) who become heroes in their own little ways, whether it’s the teenage girl who isn’t afraid to share vital information, to the scholars who suddenly find how crucial their minds are to the survival of a nation, to the humble public servants who find bravery when they need it most. This is a story of loss, love, redemption, courage, unity, and overcoming despair to not give up. All very human experiences by simple people who do extraordinary things.

Hearne’s worldbuilding is engaging. He doesn’t bottle feed you, at first it feels like drinking from a hydrant, but then you settle in and pick up things along the way. Then he shows you stuff with a punch to the gut. This is no fluffy world with simple magic without price. All the magic has a price, and more often than not it leads you straight to death’s door. For most people just the seeking of the magic will kill you. I particularly enjoyed the scenes with Ahbi and his discovery of the sixth kenning and everything associated with it. But giants? I mean, really? It isn’t bad enough fighting people who can control fire that you have to add that they’re twice the size of normal people? For Hearne if it’s war, the stakes are pretty high, and it gets ugly.

The benefit of the storytelling style is that the book, despite its length, moves along steadily (Hearne is no novice, here). The bits of story lead you along without annoying cliffhangers (mostly), and I never got bored with the switch between characters. It was easy to move between them, and they were recognizable enough that I got lost or confused. The end of the novel felt a little abrupt, but I guess that has more to do with I was ready for the story to continue, despite the exiting climax.

If you’re looking for epic fantasy with fun storytelling and clever worldbuilding, check out A PLAGUE OF GIANTS.

The post A Plague of Giants appeared first on Elitist Book Reviews.

The Artwork Of Gary Choo

Gary Choo is a concept artist/illustrator based in Singapore. I’ve know Gary for a good many years ( 17, actually ), working together in animation studios in Singapore like Silicon Illusions and Lucasfilm. Gary currently runs an art team at Mighty Bear Games, but when time allows he also draws covers for Marvel comics, and they’re amazing –

The Art Of Gary Choo
The Art Of Gary Choo
The Art Of Gary Choo
The Art Of Gary Choo
The Art Of Gary Choo

To see more of Gary’s work or to engage him for freelance work, head down to his ArtStation.

The post The Art Of Gary Choo appeared first on Halcyon Realms – Art Book Reviews – Anime, Manga, Film, Photography.

27