3 The Ohloh source code line counter
5 Copyright (C) 2006-2008 Ohloh Corporation
7 This program is free software; you can redistribute it and/or modify
8 it under the terms of the GNU General Public License Version 2 as
9 published by the Free Software Foundation.
11 Ohcount is specifically licensed under GPL v2.0, and no later version.
13 This program is distributed in the hope that it will be useful,
14 but WITHOUT ANY WARRANTY; without even the implied warranty of
15 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16 GNU General Public License for more details.
18 You should have received a copy of the GNU General Public License
19 along with this program; if not, write to the Free Software
20 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
24 Ohcount is a library for counting lines of source code.
25 It was originally developed at Ohloh, and is used to generate
26 the reports at www.ohloh.net.
28 Ohcount supports multiple languages within a single file: for example,
29 a complex HTML document might include regions of both CSS and JavaScript.
31 Ohcount has two main components: a detector which determines the primary
32 language family used by a particular source file, and a parser which
33 provides a line-by-line breakdown of the contents of a source file.
35 Ohcount provides a command line script that allows you to count individual
36 files or whole directory trees. It also allows you to find source code
37 files by language family, or to create a detailed annotation of an
38 individual source file.
40 Ohcount includes a Ruby binding which allows you to directly access its
41 language detection features from a Ruby application.
43 == System Requirements
45 Ohcount is supported on Mac OS X 10.4 and Ubuntu 6.06 LTS. Other Linux
46 environments should also work, but your mileage may vary.
48 Ohcount does not support Windows.
50 Ohcount targets Ruby 1.8.6. The build script targets Rake 0.7.3. You will
51 also require a C compiler to build the native extensions.
53 Ohcount requires the pcre library (http:///www.pcre.org).
57 The source code for ohcount is available as a tarball:
59 http://labs.ohloh.net/download/ohcount-1.0.0.tgz
61 You can also download the source code as a Git repository:
63 git clone http://git.ohloh.net/ohcount.git
67 Ohcount is packaged as a RubyGem. To build and install the gem (you will need
68 root priveleges for the install):
72 To uninstall the RubyGem:
74 $ gem uninstall ohcount
76 If you do not want to install the gem, you can simply build and run it like this:
83 To measure the lines of code, simply pass filenames or directory names
84 to the +ohcount+ script:
86 $ ohcount helloworld.c
88 Directories will be probed recursively. If you do not pass any parameters,
89 the current directory tree will be counted.
91 You can use the ohcount +detect+ option to simply determine the language
92 family of each source file. The files will not be parsed or counted.
93 For example, to find all of the ruby files in the current directory tree:
95 $ ohcount --detect | grep ^ruby
97 The +annotate+ option presents a line-by-line accounting
98 of the languages used in a source code file. For example:
100 $ ohcount --annotate ./test/src_dir/php1.php
102 == Loading ohcount from Ruby
104 If you have installed ohcount as a gem, you can load it like this:
109 If you have not installed the gem, you'll have to make sure that
110 ohcount is on your ruby load path and then require:
114 The <tt>bin/ohcount</tt> script shows examples of calling the ohcount
117 == How to Add a New Language to Ohcount
119 There are two steps required to add a new language to ohcount. First, you
120 must update the detector to identify files that use the new language. Then,
121 you must create a state machine capable of parsing that language.
123 For all changes to ohcount, you must provide unit tests if you want to submit
124 your new code to Ohloh.
126 ==== Modifying the Detector
128 The Detector primarily uses filename extensions to identify languages.
129 An hash named EXTENSION_MAP is defined in lib/ohcount/detector.rb to map
130 extensions to their assosicated parsers.
132 If your filename extension is unique, you are in luck and can simply add
133 a new item to the hash which connects your filename extension to your language.
135 If your extension potentially conflicts with other file types in other languages,
136 then you will need to provide a Ruby method which can make a determination which
137 parser applies. A small Ruby trick is at play in the EXTENSION_MAP: if your extension
138 maps to a string, that string is assumed to be the name of a parser. If your
139 extension maps to a Ruby symbol, that symbol is assumed to be the name of a
140 Ruby method which will return the name of a parser.
142 Many source files do not have an extension. The Ohcount::Detector method +disambiguate_nil+
143 is responsible for determining the parser for these files. This determination is
144 delegated to the +file+ command line tool.
146 ==== Testing the Detector
148 The directory <tt>test/detect_files</tt> contains test files for the detector.
149 These files are not used in testing the parser; they are strictly for detection.
151 To manually test an addition to the detector, rebuild ohcount and run it against
155 bin/ohcount --detect test/detect_files/my_file.ext
157 If the detector is working, you should see the name of your new language:
159 my_language test/detect_files/my_file.ext
161 To add the new detector abilities to the standard unit test suite, edit
162 <tt>test/unit/detector_test.rb</tt> and add a test for your language.
164 ==== Monoglots and Polyglots
166 The source code parser is written in C. However, you will not write any C code to add
167 a new language. The C code is generated by another bit of code called a Monoglot or
168 Polygot, written in Ruby.
170 At build time, the script <tt>ext/ohcount_native/generator.rb</tt> will be run.
171 This script loads all of the Monoglot and Polyglot files found in
172 <tt>ext/ohcount_native/glots</tt>.
173 These glots are used to generate the C file polyglots.c, which will define
174 all of the languages parsed by ohcount. Do not edit polyglots.c directly.
176 The parser for a single language is generated by a Monoglot. Most common source
177 code languages can be parsed by a Monoglot.
179 However, if your language's file format mashes together spans of code from several
180 different languages, you will need to create a Monoglot for each individual language,
181 and then a Polyglot to handle the transitions from one language to another. For example,
182 Ohcount::HTMLPolyglot handles transitions from the simple HTML Mongolot to the inline
183 CSS or Javascript Monglots and back.
185 ==== Creating a New Monoglot
187 You may not need to create a fully custom Monoglot. If your language has a simple syntax,
188 you may be able to use Ohcount::CMonoglot. This is a flexible Monoglot that can generate
189 parsers for most C-like languages. In fact, most of the parsers in ohcount are examples
190 of Ohcount::CMonoglot, and you can see examples in +generator.rb+.
192 If you do need to create a custom Monoglot, you must create a new Ruby class that
193 derives from Ohcount::Monoglot, and save it in <tt>ext/ohcount_native/glots</tt>.
195 Whether you create custom Monoglot or use a CMonoglot, you need to update
196 <tt>ext/ohcount_native/generator.rb</tt>. You must initialize an instance of your
197 glot and add it to the +polyglots+ array.
199 A Monoglot is simply an array of states and an array of transitions between those
200 states. When creating your own Monoglot, you can define as many states and transitions
201 as you require. The parser will be initialized in the first state listed.
203 As the source code is scanned, tokens which match those defined in the transitions
204 will cause the parser will advance to a new state. Each state is associated with
205 either code, comments, or blanks, so as the state machine advances through the source code,
206 the source code will be categorized accordingly.
208 ==== Testing the Parser
210 The directory <tt>test/src_dir</tt> contains source files used in parser tests. You can
211 add your own test files here.
213 To manually test your parser, rebuild ohcount and run it against your test file:
216 bin/ohcount --annotate test/src_dir/my_file.ext
218 The +annotate+ option will emit your test file to the console, and each line will be
219 labeled as code, comment, or blank.
221 To add the new parser to the standard unit test suite, you must create some additional
222 files which describe your expected test results. This is a little bit cumbersome:
224 1. First, create a new directory in <tt>test/expected_dir</tt> with
225 the same name as your test source code file. For example,
226 <tt>test/expected_dir/my_file.ext/</tt>.
228 2. Within this directory, create directories for each language used in the test source code
229 file. For example, <tt>test/expected_dir/my_file.ext/my_language/</tt>.
231 3. In this language subdirectory, create three files called +code+, +comment+, and +blanks+.
232 The +code+ file should contain all of the lines from <tt>my_file.ext</tt> which are code lines.
233 The +comment+ file should contain all comment lines.
234 The +blanks+ file is a bit different: it should contain a single line with an integer
235 which is the count of blank lines in the original file.
237 There are numerous examples in the test directories to help you out.
239 To run your tests, you can simply run
243 which runs all unit tests by default.
247 For more information visit the Ohloh website:
248 http://labs.ohloh.net
250 You can reach Ohloh via email at: