First foray into topic modeling

I spent two weeks at DHSI this year.  Week 2 I took Liz Losh’s and Jacque Wernimont’s Feminist DH, which was incredible and I highly recommend to everyone.  Check out the #femdh stream on Twitter for details.)

During week 3 of DHSI this year, I took Neal Audenaert’s Topic Modeling, in which we were introduced to using R and then using Mallet in R (following Matt Jockers’ book).  I decided to try to topic model the English Revised Standard Version of the Bible, because:  1) I know the material, 2) it was easy to scrape.

I used 1000 character chunks (except for the teeny tiny books like Philemon and some other epistles).  And I chose 20 topics (which was too small, but hey, this was my first time out), and Jockers’ stop list (which Neal gave us and I’m guessing is online somewhere).  First thing I noticed (besides needing more topics) was that the stop list needs to be expanded. Topic #13 below is basically junk, because of “thee”, “thy”, etc. Thanks to Neal for the help this week!

Here are wordclouds of the top 100 words in each topic. Some make a lot of sense.

1. 1.moses-rsvbible
2. 2.earth-rsvbible
3. 3.offering-rsvbible
4. 4.jews-jesus-rsvbible
5. 5.jesus-disciples-biblersv
6. 6.king-rsvbible
7. 7.god-christ-faith-rsvbible
8. 8.behold-rsvbible 9. 9.david-rsvbible
10. 10.lord-israel-rsvbible
11. 11.father-rsvbible
12. 12.city-house-rsvbible
13. 13.thou-thy-rsvbible
14. 14.tribe-rsvbible
15. 15.woman-man-wife-rsvbible
16. 16.house-solomon-rsvbible
17. 17.gold-rsvbible
18. 18.sons-rsvbible
19. 19.land-rsvbible
20. 20.wicked-righteous-rsvbible

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*
Website