Peter Degen-Portnoy [Thu, 14 Aug 2014 15:44:08 +0000 (11:44 -0400)]
Merge pull request #41 from blackducksw/ubuntu_14
Update Ohcount to work with Ubuntu 14
Peter Degen-Portnoy [Thu, 14 Aug 2014 15:42:23 +0000 (15:42 +0000)]
Update README for compatibility with Ubuntu 14.04
Peter Degen-Portnoy [Wed, 6 Aug 2014 10:50:52 +0000 (10:50 +0000)]
Update Ohcount to work with Ubuntu 14
The most significant change is that the perl script
that generates HTML, especially with the "<!DOCTYPE html" heading,
is now identified as HTML. This is due to differences in the
libmagic database between CentOS 5 and Ubuntu 14.
It may be possible to generate a different database, or modify the
existing database to change the identification order. However,
it is also worth acknowledging that, in the years since this code
was first developed, it may be more correct to recognize a script
that generates a complete HTML document as 'HTML', even if that
script is written in something like perl.
Peter Degen-Portnoy [Tue, 22 Apr 2014 10:43:26 +0000 (06:43 -0400)]
Merge pull request #27 from nnsathish/master
Upgrade Ohcount to support ruby 1.9.x
Abhay Mujumdar [Mon, 30 Dec 2013 15:37:11 +0000 (10:37 -0500)]
Removes whitespace from expected output file
Abhay Mujumdar [Fri, 20 Dec 2013 15:37:17 +0000 (07:37 -0800)]
Merge pull request #30 from chris-morgan/rust
Add support for Rust
Abhay Mujumdar [Wed, 18 Dec 2013 04:59:11 +0000 (23:59 -0500)]
Adds a Javascript test file with Java Emacs mode
Chris Morgan [Fri, 13 Dec 2013 11:50:39 +0000 (22:50 +1100)]
Disambiguate RenderScript .rs files from Rust.
RenderScript isn't implemented, so if a file matches the RenderScript
detection it is just left as undetected.
Sathishkumar Natesan [Wed, 11 Dec 2013 09:54:42 +0000 (15:24 +0530)]
OTWO-2506 Update lang detector to prefer extension over emacs mode line
File extensions seems more reliable than emac line,
as a javascript file can have a java emac mode.
Chris Morgan [Fri, 15 Nov 2013 12:12:37 +0000 (23:12 +1100)]
Update Rust example code for Rust 0.8.
Chris Morgan [Fri, 15 Nov 2013 07:05:32 +0000 (18:05 +1100)]
Fix Rust numbers and update keywords.
Sathishkumar Natesan [Mon, 7 Oct 2013 13:05:56 +0000 (18:35 +0530)]
Remove 1.8.7 backward compatibility patches
We are OK with ending the support with 1.8.7.
Sathishkumar Natesan [Thu, 3 Oct 2013 13:58:16 +0000 (19:28 +0530)]
Update README for ruby 1.9.3 upgrade
*** Breaks BACKWARD COMPATIBILITY ***
as ruby 1.9.3 requires SWIG 2.0 and the latter is not
backward compatible with its older versions
Sathishkumar Natesan [Thu, 3 Oct 2013 12:23:32 +0000 (17:53 +0530)]
Fix deprecations and errors for SWIG 2.0.11
Extending a `struct` with its `typedef` name is deprecated as of SWIG 2.0.5.
We should use the struct name directly, even for constructors and destructors
Its not applicable for anonymous structs (of-course)
Sathishkumar Natesan [Tue, 1 Oct 2013 04:51:25 +0000 (10:21 +0530)]
ruby 1.8.7 BC patch - Force encoding only for ruby 1.9
Sathishkumar Natesan [Mon, 30 Sep 2013 09:49:38 +0000 (15:19 +0530)]
Update build script to include appropriate ruby header file
* With ruby 1.9.x the location of ruby.h and config.h have changed
* This patch tries to load include the appropriate file paths while
compiling the c code based on the installed ruby version
* Tested on Ubuntu 12.04 with ruby 1.8.7 and 1.9.3
* Remove reference to the deprecated `Config` module
Robin Luckey [Mon, 9 Jan 2012 16:44:25 +0000 (08:44 -0800)]
OTWO-1213 Works around lost encoding in Ruby/C binding layer
When a Ruby 1.9.2 string is passed to the C code, the associated
encoding metadata is lost. When this same string is then returned from C
back to Ruby, an arbitrary, mismatched encoding is applied to replace
the lost one.
This means that a string becomes garbled in the round trip. The bits
don't change, but the encoding is lost.
The correct fix would be to preserve the encoding metadata in the C
layer.
The easier fix is to replace the lost encoding with a more likely match,
which is what I've done in this patch. When the C code returns a string,
we apply the Ruby runtime's current default encoding, which is highly likely to
be the encoding originally discarded.
Robin Luckey [Sat, 7 Jan 2012 00:12:28 +0000 (16:12 -0800)]
OTWO-1206 Updates SWIG bindings to Ruby 1.9.2
- Requires SWIG 2.0.4 (SWIG 1.3 is not compatible with Ruby 1.9.2)
This updates the SWIG bindings only. This code tree does not build:
- Build fails for me when using RVM Ruby 1.9.2 because 'ruby.h' and
'config.h' headers cannot be found. I am currenty working around
this by manually placing absolute paths in the build script.
- test/unit/ruby/source_file_test.rb is currently failing because the
round trip from Ruby -> C -> Ruby causes the string encoding
metadata to be lost. UTF-8 input strings are output with identical
content bits, but are marked as ASCII-8BIT.
Sébastien Crozet [Mon, 16 Sep 2013 15:16:29 +0000 (17:16 +0200)]
Add support for the Rust language.
Rust is a programming language developed by Mozilla: www.rust-lang.org
Abhay Mujumdar [Fri, 21 Dec 2012 15:42:29 +0000 (10:42 -0500)]
Fixes Genie support and adds a test case
Patch for Genie support was missing mapping in src/hash/parsers.gperf
file. It was also missing tests.
Abhay Mujumdar [Fri, 21 Dec 2012 15:26:57 +0000 (10:26 -0500)]
Removes failing PP disambiguation tests
Abhay Mujumdar [Thu, 13 Dec 2012 20:45:23 +0000 (15:45 -0500)]
Adds test files from dcsobral
Abhay Mujumdar [Thu, 13 Dec 2012 20:44:27 +0000 (15:44 -0500)]
Adds tests and test files from dcsobral
Daniel C. Sobral [Fri, 16 Mar 2012 01:25:09 +0000 (22:25 -0300)]
Detect file import on pp disambiguation
It's a convention used by some Puppet modules to have nothing on "init.pp"
except imports of class and definition files. Though some Pascal versions
seems to use import as well, Puppet's import are followed by glob patterns
inside double quotes, while Pascal's import are followed by identifiers.
Use imports followed by double quotes to detect these files as Puppet
files.
Abhay Mujumdar [Wed, 12 Dec 2012 23:00:47 +0000 (18:00 -0500)]
Rebase with master
Abhay Mujumdar [Wed, 12 Dec 2012 22:41:58 +0000 (14:41 -0800)]
Merge pull request #14 from silene/coq-parser
Add support for Coq .v files (based on the OCaml parser).
Abhay Mujumdar [Wed, 12 Dec 2012 21:48:11 +0000 (13:48 -0800)]
Merge pull request #21 from raphink/dev/mystrnlen_segfault
Avoid segfault on empty files
Abhay Mujumdar [Wed, 12 Dec 2012 21:56:50 +0000 (16:56 -0500)]
Merges pull/21, adds a test
Raphaël Pinson [Sat, 2 Jun 2012 05:59:45 +0000 (07:59 +0200)]
Return NULL when NULL is passed to disambiguate_pp
Raphaël Pinson [Fri, 1 Jun 2012 22:02:42 +0000 (00:02 +0200)]
Prevent segfault on empty files
Arc Riley [Mon, 13 Aug 2012 17:27:42 +0000 (13:27 -0400)]
Added support for Genie and Vala .vapi files
Raphaël Pinson [Sat, 2 Jun 2012 05:59:45 +0000 (07:59 +0200)]
Return NULL when NULL is passed to disambiguate_pp
Raphaël Pinson [Fri, 1 Jun 2012 22:02:42 +0000 (00:02 +0200)]
Prevent segfault on empty files
Robin Luckey [Fri, 1 Jun 2012 16:00:02 +0000 (09:00 -0700)]
Merge pull request #12 from raphink/puppet
Improve Puppet/Pascal disambiguation
Raphaël Pinson [Sat, 26 May 2012 18:26:45 +0000 (20:26 +0200)]
Puppet/Pascal: strncmp calls are really not helping, functionaly or timewise
Raphaël Pinson [Sat, 26 May 2012 17:51:23 +0000 (19:51 +0200)]
Puppet/Pascal: try harder to find Puppet keywords
Raphaël Pinson [Sat, 26 May 2012 11:07:36 +0000 (13:07 +0200)]
Puppet parser: improve regex matching for caret to work
Raphaël Pinson [Fri, 25 May 2012 16:30:14 +0000 (18:30 +0200)]
Recognize Puppet node definitions
Raphaël Pinson [Fri, 25 May 2012 16:13:23 +0000 (18:13 +0200)]
Avoid detecting Pascal code as Puppet
Raphaël Pinson [Fri, 25 May 2012 14:32:56 +0000 (16:32 +0200)]
Puppet parser: detect classes and defines with colons
Guillaume Melquiond [Tue, 24 Apr 2012 14:16:13 +0000 (16:16 +0200)]
Add support for Coq .v files (based on the OCaml parser).
Robin Luckey [Wed, 18 Apr 2012 22:14:39 +0000 (15:14 -0700)]
Merge branch 'master' of github.com:robinluckey/ohcount
Robin Luckey [Wed, 18 Apr 2012 22:16:59 +0000 (15:16 -0700)]
Merge pull request #10 from raphink/master
Add support for the Augeas language (based on Ocaml)
Robin Luckey [Wed, 18 Apr 2012 22:16:45 +0000 (15:16 -0700)]
Merge pull request #11 from raphink/tex
Add sty and cls for LANG_TEX, add dtx and LANG_TEX_DTX
Abhay Mujumdar [Fri, 13 Apr 2012 19:12:00 +0000 (12:12 -0700)]
Merge pull request #12 from blackducksw/libmagic
Libmagic
Raphaël Pinson [Thu, 12 Apr 2012 08:33:17 +0000 (10:33 +0200)]
Add .cls to LANG_TEX
Raphaël Pinson [Wed, 11 Apr 2012 22:02:56 +0000 (00:02 +0200)]
Make DTX a separate language derived from TeX.
Raphaël Pinson [Wed, 11 Apr 2012 21:46:16 +0000 (23:46 +0200)]
Add dtx and sty for LANG_TEX
Raphaël Pinson [Wed, 11 Apr 2012 15:50:39 +0000 (17:50 +0200)]
Add support for the Augeas language (based on Ocaml)
Robin Luckey [Mon, 9 Apr 2012 20:50:12 +0000 (13:50 -0700)]
Fixes recursion bug in disambiguate_in().
The basic strategy of disambiguate_in() is to strip the trailing *.in
extension from the filepath, and then to disambiguate the file as if it
originally had that name. Thus, given file "foo.in", disambiguate_in()
will disambiguate "foo".
disambiguate_in() achieves this while re-using the exact same file on
disk. This is possible because a SourceFile struct has both a `filepath`
(the name we use for disambiguation purposes) and the `diskpath` (the
actual name on disk).
So disambiguate_in() instantiates a new SourceFile with a stripped
filepath, yet the same diskpath and same file contents.
The bug is that the code did this incorrectly: when assigning the
diskpath of the new SourceFile, it would mistakenly assign it the
previous SourceFile's *filepath* instead of the previous SourceFile's
diskpath.
If disambiguate_in() runs just once (when the file has just a single
*.in extension, the usual case), this mistake does not matter because
the filepath and diskpath are the same.
But if disambiguate_in() recurses on itself (when the file has multiple
*.in.in extensions), then during the second pass the filepath and
diskpath will not be equal -- they will differ by one missing *.in
extension. Thus the diskpath of the new SourceFile will refer to a
(probably) non-existent file.
The bug is hard to explain but was simple to correct.
In addition to correcting the diskpath assignment, I've fixed a memory
leak: it was possible to allocate a new SourceFile, and then immediately
return NULL, which fails to free the SourceFile. I've moved the
allocation *after* the NULL return check to avoid this.
Robin Luckey [Thu, 8 Mar 2012 06:50:47 +0000 (22:50 -0800)]
Removes unused escape_path() function
Robin Luckey [Thu, 8 Mar 2012 00:10:11 +0000 (16:10 -0800)]
Use libmagic instead of spawning a process to run `file`
Robin Luckey [Tue, 6 Mar 2012 22:05:06 +0000 (14:05 -0800)]
Change README to use Github flavored Markdown
Robin Luckey [Tue, 6 Mar 2012 22:02:18 +0000 (14:02 -0800)]
README updates and corrections
Robin Luckey [Tue, 6 Mar 2012 21:46:21 +0000 (13:46 -0800)]
Merge pull request #11 from dcsobral/forth
Initial support for Forth
Robin Luckey [Tue, 6 Mar 2012 21:39:15 +0000 (13:39 -0800)]
Merge pull request #9 from haraldkl/master
Fixing Bug in Fortran disambiguation
Daniel C. Sobral [Thu, 23 Feb 2012 20:05:15 +0000 (18:05 -0200)]
Initial support for Forth
This is based on the Scala parser, which is actually quite
incorrect -- assumes existence of single-quote strings (which
will cause problem on any file with symbols), doesn't know
multiline strings, doesn't handle nested comments: all of which
made it a pretty good starting point for Forth.
Parsing Forth is impossible, but this will recognize comments,
strings and blank lines on most projects. Tested against FreeBSD
source.
Abhay Mujumdar [Mon, 13 Feb 2012 19:13:36 +0000 (11:13 -0800)]
Merge pull request #10 from blackducksw/OTWO-1300
OTWO-1300 Improves *.pl disambiguation to ignore smileys :-)
Robin Luckey [Wed, 8 Feb 2012 15:24:28 +0000 (10:24 -0500)]
OTWO-1300 Improves *.pl disambiguation to ignore smileys :-)
Smiley faces in Perl strings and comments look similar to Prolog
rule syntax. This patch makes two improvements:
- Better detection of perl shebangs (#!%PERL% now recognized)
- A prolog ':-' token must be followed by a space or a newline
Harald Klimach [Tue, 3 Jan 2012 10:45:10 +0000 (11:45 +0100)]
Update Fortran extensions to cover the list, supported by gfortran
(.FPP, .F, .FOR, .FTN, .F90, .F95, .F03 or .F08), see
http://gcc.gnu.org/onlinedocs/gfortran/Preprocessing-Options.html
Harald Klimach [Tue, 3 Jan 2012 09:27:44 +0000 (10:27 +0100)]
Changed the logic to disambiguate free and fixed formatted Fortran
Test the assumption of a fixed format code and indicate free
format, as soon as any line breaks this assumption.
(It is easier to check for fixed form constraints)
Rules for fixed format are taken from the standard, see
ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1830.pdf p. 47.
Harald Klimach [Mon, 2 Jan 2012 00:33:38 +0000 (01:33 +0100)]
More typical Fortran free formatted test file.
Harald Klimach [Sat, 31 Dec 2011 11:16:11 +0000 (12:16 +0100)]
Return free format in the Fortran disambiguation,
if the code is definetly not fixed.
Robin Luckey [Thu, 22 Dec 2011 16:07:09 +0000 (08:07 -0800)]
Adds unit test for escape_path()
Robin Luckey [Wed, 21 Dec 2011 17:46:35 +0000 (09:46 -0800)]
OTWO-1137 Escapes single quotes in file paths
Robin Luckey [Fri, 16 Dec 2011 19:19:19 +0000 (11:19 -0800)]
Adds additional comment styles for MS-DOS batch files
In addition to 'REM', we now accept '@REM' and '::'.
Note that test/expected_dir/bat1.bat should be tab-delimited (not
space-delimited), so this patch also corrects that.
Robin Luckey [Thu, 15 Dec 2011 23:11:20 +0000 (15:11 -0800)]
Merge branch 'master' of git://github.com/pfusik/ohcount
Robin Luckey [Thu, 15 Dec 2011 22:48:16 +0000 (14:48 -0800)]
Corrections to Logtalk unit tests
Robin Luckey [Thu, 15 Dec 2011 22:50:32 +0000 (14:50 -0800)]
Merge branch 'master' of git://github.com/pmoura/ohcount
Robin Luckey [Thu, 15 Dec 2011 22:30:59 +0000 (14:30 -0800)]
Fixes crash in disambiguate_r() when source file is empty
Thanks to ehsan for discovering this bug.
Ehsan Akhgari [Sun, 9 Oct 2011 21:06:59 +0000 (17:06 -0400)]
Enable building on Mac, which lacks the strnlen function by using memchr instead of it
Robin Luckey [Thu, 15 Dec 2011 21:51:55 +0000 (13:51 -0800)]
Merge pull request #6 from cmarcelo/qml
Add support for Qt's QML language
Robin Luckey [Thu, 15 Dec 2011 21:42:47 +0000 (13:42 -0800)]
Merge pull request #5 from koraktor/ruby
Added more filenames and extensions for Ruby
Robin Luckey [Thu, 15 Dec 2011 21:42:33 +0000 (13:42 -0800)]
Merge pull request #4 from koraktor/mustache
Treat Mustache templates as HTML
Caio Marcelo de Oliveira Filho [Sat, 22 Oct 2011 05:08:48 +0000 (02:08 -0300)]
Add support for Qt's QML language
Reusing the JS parser, since QML is 'almost' JavaScript. The
approximation is good enough for the line counting purposes.
Piotr Fusik [Mon, 29 Aug 2011 12:42:53 +0000 (14:42 +0200)]
Check if *.def files are Modula-2.
Sebastian Staudt [Thu, 11 Aug 2011 12:50:54 +0000 (14:50 +0200)]
Treat Mustache templates as HTML
Mustache introduces only a small amount of additional syntax, so treating
its templates as pure HTML shouldn't hurt.
Sebastian Staudt [Thu, 11 Aug 2011 12:47:42 +0000 (14:47 +0200)]
Added more filenames and extensions for Ruby
Paulo Moura [Tue, 9 Aug 2011 16:11:04 +0000 (11:11 -0500)]
Minor improvement for detecting Perl files.
Robin Luckey [Tue, 9 Aug 2011 15:39:34 +0000 (08:39 -0700)]
Merge branch 'ecere'
Robin Luckey [Tue, 9 Aug 2011 15:38:16 +0000 (08:38 -0700)]
Completes eC parser
- Adds parse_ec() to the list of parsers
- Adds a test to ensure that line counter works
Piotr Fusik [Thu, 15 Jul 2010 18:26:21 +0000 (20:26 +0200)]
Add file extensions "asx" and "as8" - 6502 assembler.
Robin Luckey [Mon, 8 Aug 2011 22:03:22 +0000 (15:03 -0700)]
OTWO-922 Adds CoffeeScript parser
Robin Luckey [Mon, 8 Aug 2011 20:19:06 +0000 (13:19 -0700)]
Merge https://github.com/bytbox/ohcount into jam
Robin Luckey [Mon, 8 Aug 2011 20:12:02 +0000 (13:12 -0700)]
Merge branch 'adding_racket' of https://github.com/jbclements/ohcount into racket
Conflicts:
src/hash/languages.gperf
src/hash/parsers.gperf
src/languages.h
test/unit/parser_test.h
Robin Luckey [Mon, 8 Aug 2011 20:06:24 +0000 (13:06 -0700)]
Merge branch 'master' of https://github.com/earl/ohcount
Robin Luckey [Mon, 8 Aug 2011 20:02:06 +0000 (13:02 -0700)]
Merge branch 'rebol' of https://github.com/earl/ohcount into rebol
Robin Luckey [Mon, 8 Aug 2011 19:45:59 +0000 (12:45 -0700)]
Merge https://github.com/ecere/ohcount into ecere
Paulo Moura [Sun, 7 Aug 2011 01:32:11 +0000 (02:32 +0100)]
Added basic unit tests for Prolog parsing.
Paulo Moura [Sun, 7 Aug 2011 00:29:14 +0000 (01:29 +0100)]
Added basic unit tests for Logtalk parsing.
Paulo Moura [Sat, 6 Aug 2011 23:39:39 +0000 (00:39 +0100)]
Added basic support for Logtalk and Prolog (missing parsers in previous commit\!).
Paulo Moura [Sat, 6 Aug 2011 23:10:31 +0000 (00:10 +0100)]
Added basic support for Logtalk and Prolog.
John Clements [Wed, 6 Jul 2011 19:01:20 +0000 (12:01 -0700)]
adding racket, re-using lisp parser, following clojure's lead
Robin Luckey [Mon, 20 Jun 2011 15:28:44 +0000 (11:28 -0400)]
OTWO-803 Fixes disambiguate_pp() performance sink
disambiguate_pp() failed to execute in a reasonable time for extremely
large (1MB+) files.
The reason is that a regular expression is evaluated for each line of
the file, and this regular expression is scoped from the beginning of
the line to the end of the file. When the file is extremely large,
the regular expression evaluation runs away with the CPU.
By limiting the scope of the regular expression evaluation to no more
than 100 characters from its start point, we can avoid the runaway
performance sink. This is a reasonable change since the expression we
are looking to match should almost always fit within 100 chars anyway.
Andreas Bolka [Wed, 1 Jun 2011 23:40:11 +0000 (01:40 +0200)]
Fix filename in Go parser attribution line
Signed-off-by: Andreas Bolka <a@bolka.at>
Andreas Bolka [Wed, 1 Jun 2011 23:39:25 +0000 (01:39 +0200)]
Fix ragel include in parser example skeleton
Signed-off-by: Andreas Bolka <a@bolka.at>
Andreas Bolka [Wed, 1 Jun 2011 23:35:01 +0000 (01:35 +0200)]
Implement parsing of REBOL multi-line strings
Signed-off-by: Andreas Bolka <a@bolka.at>
Andreas Bolka [Wed, 1 Jun 2011 21:21:23 +0000 (23:21 +0200)]
Add REBOL detection and (basic) parsing
Also adds a simple .r disambiguation to discern REBOL and R sources. R
is the default, REBOL is used if "rebol" is found anywhere in the
contents.
The REBOL parser currently does not handle multi-line strings ({...}),
which could (in rare cases) lead to string parts being classified as
comments.
Signed-off-by: Andreas Bolka <a@bolka.at>
Jerome St-Louis [Sat, 21 May 2011 08:15:39 +0000 (04:15 -0400)]
Added missing detector test files
Jerome St-Louis [Sat, 21 May 2011 08:00:54 +0000 (04:00 -0400)]
Added support for the eC language (www.ecere.com)