= Ohcount The Ohloh source code line counter Copyright (C) 2006-2008 Ohloh Corporation This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License Version 2 as published by the Free Software Foundation. Ohcount is specifically licensed under GPL v2.0, and no later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA == Overview Ohcount is a library for counting lines of source code. It was originally developed at Ohloh, and is used to generate the reports at www.ohloh.net. Ohcount supports multiple languages within a single file: for example, a complex HTML document might include regions of both CSS and JavaScript. Ohcount has two main components: a detector which determines the primary language family used by a particular source file, and a parser which provides a line-by-line breakdown of the contents of a source file. Ohcount provides a command line script that allows you to count individual files or whole directory trees. It also allows you to find source code files by language family, or to create a detailed annotation of an individual source file. Ohcount includes a Ruby binding which allows you to directly access its language detection features from a Ruby application. == System Requirements Ohcount is supported on Mac OS X 10.4 and Ubuntu 6.06 LTS. Other Linux environments should also work, but your mileage may vary. Ohcount does not support Windows. Ohcount targets Ruby 1.8.6. The build script targets Rake 0.7.3. You will also require a C compiler to build the native extensions. Ohcount requires the pcre library (http:///www.pcre.org). == Download The source code for ohcount is available as a tarball: http://labs.ohloh.net/download/ohcount-1.0.0.tgz You can also download the source code as a Git repository: git clone http://git.ohloh.net/ohcount.git == Installation Ohcount is packaged as a RubyGem. To build and install the gem (you will need root priveleges for the install): $ rake install To uninstall the RubyGem: $ gem uninstall ohcount If you do not want to install the gem, you can simply build and run it like this: $ rake $ bin/ohcount == First Steps To measure the lines of code, simply pass filenames or directory names to the +ohcount+ script: $ ohcount helloworld.c Directories will be probed recursively. If you do not pass any parameters, the current directory tree will be counted. You can use the ohcount +detect+ option to simply determine the language family of each source file. The files will not be parsed or counted. For example, to find all of the ruby files in the current directory tree: $ ohcount --detect | grep ^ruby The +annotate+ option presents a line-by-line accounting of the languages used in a source code file. For example: $ ohcount --annotate ./test/src_dir/php1.php == Loading ohcount from Ruby If you have installed ohcount as a gem, you can load it like this: require 'rubygems' require 'ohcount' If you have not installed the gem, you'll have to make sure that ohcount is on your ruby load path and then require: require 'ohcount' The bin/ohcount script shows examples of calling the ohcount libraries from Ruby. == How to Add a New Language to Ohcount There are two steps required to add a new language to ohcount. First, you must update the detector to identify files that use the new language. Then, you must create a state machine capable of parsing that language. For all changes to ohcount, you must provide unit tests if you want to submit your new code to Ohloh. ==== Modifying the Detector The Detector primarily uses filename extensions to identify languages. An hash named EXTENSION_MAP is defined in lib/ohcount/detector.rb to map extensions to their assosicated parsers. If your filename extension is unique, you are in luck and can simply add a new item to the hash which connects your filename extension to your language. If your extension potentially conflicts with other file types in other languages, then you will need to provide a Ruby method which can make a determination which parser applies. A small Ruby trick is at play in the EXTENSION_MAP: if your extension maps to a string, that string is assumed to be the name of a parser. If your extension maps to a Ruby symbol, that symbol is assumed to be the name of a Ruby method which will return the name of a parser. Many source files do not have an extension. The Ohcount::Detector method +disambiguate_nil+ is responsible for determining the parser for these files. This determination is delegated to the +file+ command line tool. ==== Testing the Detector The directory test/detect_files contains test files for the detector. These files are not used in testing the parser; they are strictly for detection. To manually test an addition to the detector, rebuild ohcount and run it against your test file: rake bin/ohcount --detect test/detect_files/my_file.ext If the detector is working, you should see the name of your new language: my_language test/detect_files/my_file.ext To add the new detector abilities to the standard unit test suite, edit test/unit/detector_test.rb and add a test for your language. ==== Monoglots and Polyglots The source code parser is written in C. However, you will not write any C code to add a new language. The C code is generated by another bit of code called a Monoglot or Polygot, written in Ruby. At build time, the script ext/ohcount_native/generator.rb will be run. This script loads all of the Monoglot and Polyglot files found in ext/ohcount_native/glots. These glots are used to generate the C file polyglots.c, which will define all of the languages parsed by ohcount. Do not edit polyglots.c directly. The parser for a single language is generated by a Monoglot. Most common source code languages can be parsed by a Monoglot. However, if your language's file format mashes together spans of code from several different languages, you will need to create a Monoglot for each individual language, and then a Polyglot to handle the transitions from one language to another. For example, Ohcount::HTMLPolyglot handles transitions from the simple HTML Mongolot to the inline CSS or Javascript Monglots and back. ==== Creating a New Monoglot You may not need to create a fully custom Monoglot. If your language has a simple syntax, you may be able to use Ohcount::CMonoglot. This is a flexible Monoglot that can generate parsers for most C-like languages. In fact, most of the parsers in ohcount are examples of Ohcount::CMonoglot, and you can see examples in +generator.rb+. If you do need to create a custom Monoglot, you must create a new Ruby class that derives from Ohcount::Monoglot, and save it in ext/ohcount_native/glots. Whether you create custom Monoglot or use a CMonoglot, you need to update ext/ohcount_native/generator.rb. You must initialize an instance of your glot and add it to the +polyglots+ array. A Monoglot is simply an array of states and an array of transitions between those states. When creating your own Monoglot, you can define as many states and transitions as you require. The parser will be initialized in the first state listed. As the source code is scanned, tokens which match those defined in the transitions will cause the parser will advance to a new state. Each state is associated with either code, comments, or blanks, so as the state machine advances through the source code, the source code will be categorized accordingly. ==== Testing the Parser The directory test/src_dir contains source files used in parser tests. You can add your own test files here. To manually test your parser, rebuild ohcount and run it against your test file: rake bin/ohcount --annotate test/src_dir/my_file.ext The +annotate+ option will emit your test file to the console, and each line will be labeled as code, comment, or blank. To add the new parser to the standard unit test suite, you must create some additional files which describe your expected test results. This is a little bit cumbersome: 1. First, create a new directory in test/expected_dir with the same name as your test source code file. For example, test/expected_dir/my_file.ext/. 2. Within this directory, create directories for each language used in the test source code file. For example, test/expected_dir/my_file.ext/my_language/. 3. In this language subdirectory, create three files called +code+, +comment+, and +blanks+. The +code+ file should contain all of the lines from my_file.ext which are code lines. The +comment+ file should contain all comment lines. The +blanks+ file is a bit different: it should contain a single line with an integer which is the count of blank lines in the original file. There are numerous examples in the test directories to help you out. To run your tests, you can simply run rake which runs all unit tests by default. == Contact Ohloh For more information visit the Ohloh website: http://labs.ohloh.net You can reach Ohloh via email at: info@ohloh.net