winzip icon

Summary Generator (demonstrates cool algorithm)

Email
Submitted on: 1/7/2015 4:15:00 PM
By: James Vincent Carnicelli (from psc cd)  
Level: Advanced
User Rating: By 14 Users
Compatibility: VB 3.0, VB 4.0 (16-bit), VB 4.0 (32-bit), VB 5.0, VB 6.0, VB Script, ASP (Active Server Pages)
Views: 88
 
     Here's an easy-to-use utility that can take a chunk of plain text and generate a summary with up to a number of words you specify (e.g., 1000).

This code demonstrates an exceedingly cool algorithm I recently read about that basically works like this. Count all the occurrances of each word in the text. Score each sentence based mainly on how many of the most frequent words are in it (with a few other biases and ignoring dull words like "the"). Pick enough of the highest scoring sentences to meet the maximum word limit for the summary. Assemble these sentences into a summary.

Unbelievably simple, in theory. Not bad in practice, despite the fact that the engine doesn't really "understand" what it's summarizing. All it's doing is picking out the most "representative" sentences.

Included with this demo is an article from Seasoned Cooking magazine (seasoned.com). Try setting the maximum number of words to 100, 200, 300, and so on and see what you get. You can paste any text in you want. Be sure, though, that you help the engine out by putting one or more blank lines between any paragraphs, bullet points, etc.

Also, be aware that all periods are assumed to be end-of-sentence markers, even in abbreviations like "i.e.". This and a few other limitations make this algorithm imperfect, but still very illustrative of one kind of linguistic analysis engine.

I suspect this sort of code is unique on Planet Source Code, so I welcome and encourage your comments. Your vote is also appreciated.

----------------
Recent Updates:
8 November 2000: Engine improvement
- Added a list of "hot words" to bias in favor of sentences with words like "key" and "important"
9 July 2000: Engine improvements
- Ignores words with few characters
- Ignores topmost frequent words
- Ignores lower half of infrequent words
- Better bias towards beginning and end paragraphs


 
winzip iconDownload code

Note: Due to the size or complexity of this submission, the author has submitted it as a .zip file to shorten your download time. Afterdownloading it, you will need a program like Winzip to decompress it.Virus note:All files are scanned once-a-day by Planet Source Code for viruses, but new viruses come out every day, so no prevention program can catch 100% of them. For your own safety, please:
  1. Re-scan downloaded files using your personal virus checker before using it.
  2. NEVER, EVER run compiled files (.exe's, .ocx's, .dll's etc.)--only run source code.
  3. Scan the source code with Minnow's Project Scanner

If you don't have a virus scanner, you can get one at many places on the net including:McAfee.com


Other 20 submission(s) by this author

 


Report Bad Submission
Use this form to tell us if this entry should be deleted (i.e contains no code, is a virus, etc.).
This submission should be removed because:

Your Vote

What do you think of this code (in the Advanced category)?
(The code with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor (See voting log ...)
 

Other User Comments


 There are no comments on this submission.
 

Add Your Feedback
Your feedback will be posted below and an email sent to the author. Please remember that the author was kind enough to share this with you, so any criticisms must be stated politely, or they will be deleted. (For feedback not related to this particular code, please click here instead.)
 

To post feedback, first please login.