= Ohcount
The Ohloh source code line counter
Copyright (C) 2006-2008 Ohloh Corporation
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License Version 2 as
published by the Free Software Foundation.
Ohcount is specifically licensed under GPL v2.0, and no later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
== Overview
Ohcount is a library for counting lines of source code.
It was originally developed at Ohloh, and is used to generate
the reports at www.ohloh.net.
Ohcount supports multiple languages within a single file: for example,
a complex HTML document might include regions of both CSS and JavaScript.
Ohcount has two main components: a detector which determines the primary
language family used by a particular source file, and a parser which
provides a line-by-line breakdown of the contents of a source file.
Ohcount provides a command line script that allows you to count individual
files or whole directory trees. It also allows you to find source code
files by language family, or to create a detailed annotation of an
individual source file.
Ohcount includes a Ruby binding which allows you to directly access its
language detection features from a Ruby application.
== System Requirements
Ohcount is supported on Mac OS X 10.4 and Ubuntu 6.06 LTS. Other Linux
environments should also work, but your mileage may vary.
Ohcount does not support Windows.
Ohcount targets Ruby 1.8.6. The build script targets Rake 0.7.3. You will
also require a C compiler to build the native extensions.
Ohcount requires the pcre library (http:///www.pcre.org).
== Download
The source code for ohcount is available as a tarball:
http://labs.ohloh.net/download/ohcount-1.0.0.tgz
You can also download the source code as a Git repository:
git clone http://git.ohloh.net/ohcount.git
== Installation
Ohcount is packaged as a RubyGem. To build and install the gem (you will need
root priveleges for the install):
$ rake install
To uninstall the RubyGem:
$ gem uninstall ohcount
If you do not want to install the gem, you can simply build and run it like this:
$ rake
$ bin/ohcount
== First Steps
To measure the lines of code, simply pass filenames or directory names
to the +ohcount+ script:
$ ohcount helloworld.c
Directories will be probed recursively. If you do not pass any parameters,
the current directory tree will be counted.
You can use the ohcount +detect+ option to simply determine the language
family of each source file. The files will not be parsed or counted.
For example, to find all of the ruby files in the current directory tree:
$ ohcount --detect | grep ^ruby
The +annotate+ option presents a line-by-line accounting
of the languages used in a source code file. For example:
$ ohcount --annotate ./test/src_dir/php1.php
== Loading ohcount from Ruby
If you have installed ohcount as a gem, you can load it like this:
require 'rubygems'
require 'ohcount'
If you have not installed the gem, you'll have to make sure that
ohcount is on your ruby load path and then require:
require 'ohcount'
The bin/ohcount script shows examples of calling the ohcount
libraries from Ruby.
== How to Add a New Language to Ohcount
There are two steps required to add a new language to ohcount. First, you
must update the detector to identify files that use the new language. Then,
you must create a state machine capable of parsing that language.
For all changes to ohcount, you must provide unit tests if you want to submit
your new code to Ohloh.
==== Modifying the Detector
The Detector primarily uses filename extensions to identify languages.
An hash named EXTENSION_MAP is defined in lib/ohcount/detector.rb to map
extensions to their assosicated parsers.
If your filename extension is unique, you are in luck and can simply add
a new item to the hash which connects your filename extension to your language.
If your extension potentially conflicts with other file types in other languages,
then you will need to provide a Ruby method which can make a determination which
parser applies. A small Ruby trick is at play in the EXTENSION_MAP: if your extension
maps to a string, that string is assumed to be the name of a parser. If your
extension maps to a Ruby symbol, that symbol is assumed to be the name of a
Ruby method which will return the name of a parser.
Many source files do not have an extension. The Ohcount::Detector method +disambiguate_nil+
is responsible for determining the parser for these files. This determination is
delegated to the +file+ command line tool.
==== Testing the Detector
The directory test/detect_files contains test files for the detector.
These files are not used in testing the parser; they are strictly for detection.
To manually test an addition to the detector, rebuild ohcount and run it against
your test file:
rake
bin/ohcount --detect test/detect_files/my_file.ext
If the detector is working, you should see the name of your new language:
my_language test/detect_files/my_file.ext
To add the new detector abilities to the standard unit test suite, edit
test/unit/detector_test.rb and add a test for your language.
==== Monoglots and Polyglots
The source code parser is written in C. However, you will not write any C code to add
a new language. The C code is generated by another bit of code called a Monoglot or
Polygot, written in Ruby.
At build time, the script ext/ohcount_native/generator.rb will be run.
This script loads all of the Monoglot and Polyglot files found in
ext/ohcount_native/glots.
These glots are used to generate the C file polyglots.c, which will define
all of the languages parsed by ohcount. Do not edit polyglots.c directly.
The parser for a single language is generated by a Monoglot. Most common source
code languages can be parsed by a Monoglot.
However, if your language's file format mashes together spans of code from several
different languages, you will need to create a Monoglot for each individual language,
and then a Polyglot to handle the transitions from one language to another. For example,
Ohcount::HTMLPolyglot handles transitions from the simple HTML Mongolot to the inline
CSS or Javascript Monglots and back.
==== Creating a New Monoglot
You may not need to create a fully custom Monoglot. If your language has a simple syntax,
you may be able to use Ohcount::CMonoglot. This is a flexible Monoglot that can generate
parsers for most C-like languages. In fact, most of the parsers in ohcount are examples
of Ohcount::CMonoglot, and you can see examples in +generator.rb+.
If you do need to create a custom Monoglot, you must create a new Ruby class that
derives from Ohcount::Monoglot, and save it in ext/ohcount_native/glots.
Whether you create custom Monoglot or use a CMonoglot, you need to update
ext/ohcount_native/generator.rb. You must initialize an instance of your
glot and add it to the +polyglots+ array.
A Monoglot is simply an array of states and an array of transitions between those
states. When creating your own Monoglot, you can define as many states and transitions
as you require. The parser will be initialized in the first state listed.
As the source code is scanned, tokens which match those defined in the transitions
will cause the parser will advance to a new state. Each state is associated with
either code, comments, or blanks, so as the state machine advances through the source code,
the source code will be categorized accordingly.
==== Testing the Parser
The directory test/src_dir contains source files used in parser tests. You can
add your own test files here.
To manually test your parser, rebuild ohcount and run it against your test file:
rake
bin/ohcount --annotate test/src_dir/my_file.ext
The +annotate+ option will emit your test file to the console, and each line will be
labeled as code, comment, or blank.
To add the new parser to the standard unit test suite, you must create some additional
files which describe your expected test results. This is a little bit cumbersome:
1. First, create a new directory in test/expected_dir with
the same name as your test source code file. For example,
test/expected_dir/my_file.ext/.
2. Within this directory, create directories for each language used in the test source code
file. For example, test/expected_dir/my_file.ext/my_language/.
3. In this language subdirectory, create three files called +code+, +comment+, and +blanks+.
The +code+ file should contain all of the lines from my_file.ext which are code lines.
The +comment+ file should contain all comment lines.
The +blanks+ file is a bit different: it should contain a single line with an integer
which is the count of blank lines in the original file.
There are numerous examples in the test directories to help you out.
To run your tests, you can simply run
rake
which runs all unit tests by default.
== Contact Ohloh
For more information visit the Ohloh website:
http://labs.ohloh.net
You can reach Ohloh via email at:
info@ohloh.net