git.oblomov.eu Git - ohcount/blob - README

   1 = Ohcount
   2
   3 The Ohloh source code line counter
   4
   5 Copyright (C) 2006-2008 Ohloh Corporation
   6
   7 This program is free software; you can redistribute it and/or modify
   8 it under the terms of the GNU General Public License Version 2 as
   9 published by the Free Software Foundation.
  10
  11 Ohcount is specifically licensed under GPL v2.0, and no later version.
  12
  13 This program is distributed in the hope that it will be useful,
  14 but WITHOUT ANY WARRANTY; without even the implied warranty of
  15 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  16 GNU General Public License for more details.
  17
  18 You should have received a copy of the GNU General Public License
  19 along with this program; if not, write to the Free Software
  20 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  21
  22 == Overview
  23
  24 Ohcount is a library for counting lines of source code.
  25 It was originally developed at Ohloh, and is used to generate
  26 the reports at www.ohloh.net.
  27
  28 Ohcount supports multiple languages within a single file: for example,
  29 a complex HTML document might include regions of both CSS and JavaScript.
  30
  31 Ohcount has two main components: a detector which determines the primary
  32 language family used by a particular source file, and a parser which
  33 provides a line-by-line breakdown of the contents of a source file.
  34
  35 Ohcount provides a command line script that allows you to count individual
  36 files or whole directory trees. It also allows you to find source code
  37 files by language family, or to create a detailed annotation of an
  38 individual source file.
  39
  40 Ohcount includes a Ruby binding which allows you to directly access its
  41 language detection features from a Ruby application.
  42
  43 == System Requirements
  44
  45 Ohcount is supported on Mac OS X 10.4 and Ubuntu 6.06 LTS. Other Linux
  46 environments should also work, but your mileage may vary.
  47
  48 Ohcount does not support Windows.
  49
  50 Ohcount targets Ruby 1.8.6. The build script targets Rake 0.7.3. You will
  51 also require a C compiler to build the native extensions.
  52
  53 Ohcount requires the pcre library (http:///www.pcre.org).
  54
  55 == Download
  56
  57 The source code for ohcount is available as a tarball:
  58
  59   http://labs.ohloh.net/download/ohcount-1.0.0.tgz
  60
  61 You can also download the source code as a Git repository:
  62
  63   git clone http://git.ohloh.net/ohcount.git
  64
  65 == Installation
  66
  67 Ohcount is packaged as a RubyGem. To build and install the gem (you will need
  68 root priveleges for the install):
  69
  70   $ rake install
  71
  72 To uninstall the RubyGem:
  73
  74   $ gem uninstall ohcount
  75
  76 If you do not want to install the gem, you can simply build and run it like this:
  77
  78   $ rake
  79   $ bin/ohcount
  80
  81 == First Steps
  82
  83 To measure the lines of code, simply pass filenames or directory names
  84 to the +ohcount+ script:
  85
  86   $ ohcount helloworld.c
  87
  88 Directories will be probed recursively. If you do not pass any parameters,
  89 the current directory tree will be counted.
  90
  91 You can use the ohcount +detect+ option to simply determine the language
  92 family of each source file. The files will not be parsed or counted.
  93 For example, to find all of the ruby files in the current directory tree:
  94
  95   $ ohcount --detect | grep ^ruby
  96
  97 The +annotate+ option presents a line-by-line accounting
  98 of the languages used in a source code file. For example:
  99
 100   $ ohcount --annotate ./test/src_dir/php1.php
 101
 102 == Loading ohcount from Ruby
 103
 104 If you have installed ohcount as a gem, you can load it like this:
 105
 106   require 'rubygems'
 107   require 'ohcount'
 108
 109 If you have not installed the gem, you'll have to make sure that
 110 ohcount is on your ruby load path and then require:
 111
 112   require 'ohcount'
 113
 114 The <tt>bin/ohcount</tt> script shows examples of calling the ohcount
 115 libraries from Ruby.
 116
 117 == How to Add a New Language to Ohcount
 118
 119 There are two steps required to add a new language to ohcount. First, you
 120 must update the detector to identify files that use the new language. Then,
 121 you must create a state machine capable of parsing that language.
 122
 123 For all changes to ohcount, you must provide unit tests if you want to submit
 124 your new code to Ohloh.
 125
 126 ==== Modifying the Detector
 127
 128 The Detector primarily uses filename extensions to identify languages.
 129 An hash named EXTENSION_MAP is defined in lib/ohcount/detector.rb to map
 130 extensions to their assosicated parsers.
 131
 132 If your filename extension is unique, you are in luck and can simply add
 133 a new item to the hash which connects your filename extension to your language.
 134
 135 If your extension potentially conflicts with other file types in other languages,
 136 then you will need to provide a Ruby method which can make a determination which
 137 parser applies. A small Ruby trick is at play in the EXTENSION_MAP: if your extension
 138 maps to a string, that string is assumed to be the name of a parser. If your
 139 extension maps to a Ruby symbol, that symbol is assumed to be the name of a
 140 Ruby method which will return the name of a parser.
 141
 142 Many source files do not have an extension. The Ohcount::Detector method +disambiguate_nil+
 143 is responsible for determining the parser for these files. This determination is
 144 delegated to the +file+ command line tool.
 145
 146 ==== Testing the Detector
 147
 148 The directory <tt>test/detect_files</tt> contains test files for the detector.
 149 These files are not used in testing the parser; they are strictly for detection.
 150
 151 To manually test an addition to the detector, rebuild ohcount and run it against
 152 your test file:
 153
 154   rake
 155   bin/ohcount --detect test/detect_files/my_file.ext
 156
 157 If the detector is working, you should see the name of your new language:
 158
 159   my_language  test/detect_files/my_file.ext
 160
 161 To add the new detector abilities to the standard unit test suite, edit
 162 <tt>test/unit/detector_test.rb</tt> and add a test for your language.
 163
 164 ==== Monoglots and Polyglots
 165
 166 The source code parser is written in C. However, you will not write any C code to add
 167 a new language. The C code is generated by another bit of code called a Monoglot or
 168 Polygot, written in Ruby.
 169
 170 At build time, the script <tt>ext/ohcount_native/generator.rb</tt> will be run.
 171 This script loads all of the Monoglot and Polyglot files found in
 172 <tt>ext/ohcount_native/glots</tt>.
 173 These glots are used to generate the C file polyglots.c, which will define
 174 all of the languages parsed by ohcount. Do not edit polyglots.c directly.
 175
 176 The parser for a single language is generated by a Monoglot. Most common source
 177 code languages can be parsed by a Monoglot.
 178
 179 However, if your language's file format mashes together spans of code from several
 180 different languages, you will need to create a Monoglot for each individual language,
 181 and then a Polyglot to handle the transitions from one language to another. For example,
 182 Ohcount::HTMLPolyglot handles transitions from the simple HTML Mongolot to the inline
 183 CSS or Javascript Monglots and back.
 184
 185 ==== Creating a New Monoglot
 186
 187 You may not need to create a fully custom Monoglot. If your language has a simple syntax,
 188 you may be able to use Ohcount::CMonoglot. This is a flexible Monoglot that can generate
 189 parsers for most C-like languages. In fact, most of the parsers in ohcount are examples
 190 of Ohcount::CMonoglot, and you can see examples in +generator.rb+.
 191
 192 If you do need to create a custom Monoglot, you must create a new Ruby class that
 193 derives from Ohcount::Monoglot, and save it in <tt>ext/ohcount_native/glots</tt>.
 194
 195 Whether you create custom Monoglot or use a CMonoglot, you need to update
 196 <tt>ext/ohcount_native/generator.rb</tt>. You must initialize an instance of your
 197 glot and add it to the +polyglots+ array.
 198
 199 A Monoglot is simply an array of states and an array of transitions between those
 200 states. When creating your own Monoglot, you can define as many states and transitions
 201 as you require. The parser will be initialized in the first state listed.
 202
 203 As the source code is scanned, tokens which match those defined in the transitions
 204 will cause the parser will advance to a new state. Each state is associated with
 205 either code, comments, or blanks, so as the state machine advances through the source code,
 206 the source code will be categorized accordingly.
 207
 208 ==== Testing the Parser
 209
 210 The directory <tt>test/src_dir</tt> contains source files used in parser tests. You can
 211 add your own test files here.
 212
 213 To manually test your parser, rebuild ohcount and run it against your test file:
 214
 215   rake
 216   bin/ohcount --annotate test/src_dir/my_file.ext
 217
 218 The +annotate+ option will emit your test file to the console, and each line will be
 219 labeled as code, comment, or blank.
 220
 221 To add the new parser to the standard unit test suite, you must create some additional
 222 files which describe your expected test results. This is a little bit cumbersome:
 223
 224 1. First, create a new directory in <tt>test/expected_dir</tt> with
 225    the same name as your test source code file. For example,
 226    <tt>test/expected_dir/my_file.ext/</tt>.
 227
 228 2. Within this directory, create directories for each language used in the test source code
 229    file. For example, <tt>test/expected_dir/my_file.ext/my_language/</tt>.
 230
 231 3. In this language subdirectory, create three files called +code+, +comment+, and +blanks+.
 232    The +code+ file should contain all of the lines from <tt>my_file.ext</tt> which are code lines.
 233    The +comment+ file should contain all comment lines.
 234    The +blanks+ file is a bit different: it should contain a single line with an integer
 235    which is the count of blank lines in the original file.
 236
 237 There are numerous examples in the test directories to help you out.
 238
 239 To run your tests, you can simply run
 240
 241    rake
 242
 243 which runs all unit tests by default.
 244
 245 == Contact Ohloh
 246
 247 For more information visit the Ohloh website:
 248    http://labs.ohloh.net
 249
 250 You can reach Ohloh via email at:
 251    info@ohloh.net