PART I Introduction

“From a drop of water . . . a logician could infer the possibility of an Atlantic or a Niagara without having seen or heard of one or the other. So all life is a great chain, the nature of which is known whenever we are shown a single link of it. Like all other arts, the Science of Deduction and Analysis is one which can only be acquired by long and patient study nor is life long enough to allow any mortal to attain the highest possible perfection in it. Before turning to those moral and mental aspects of the matter which present the greatest difficulties, let the enquirer begin by mastering more elementary problems.”

-Sherlock Holmes in A Study in Scarlet

“This excerpt is from the book, ‘The Software IP Detective’s Handbook: Measurement, Comparison, and Infringement Detection’ by Bob Zeidman, published by Pearson/Prentice Hall Professional, ISBN 0137035330, May 2011, Copyright 2011 Pearson Education, Inc. For more info please visit the publisher site: www.informit.com/title/0137035330

What is intellectual property and why is it important? According to the World Intellectual Property Organization (WIPO):

Intellectual property (IP) refers to creations of the mind: inventions, literary and artistic works, and symbols, names, images, and designs used in commerce.

Intellectual property is divided into two categories: Industrial property, which includes inventions (patents), trademarks, industrial designs, and geographic indications of source; and Copyright, which includes literary and artistic works such as novels, poems and plays, films, musical works, artistic works such as drawings, paintings, photographs and sculptures, and architectural designs. Rights related to copyright include those of performing artists in their performances, producers of phonograms in their recordings, and those of broadcasters in their radio and television programs.

Software is obviously a creation of the mind-simply ask any programmer who has spent numerous hours racking his brain trying to implement some complex function or trying to debug a piece of computer code that just will not work. Software embodies intellectual property in the human-readable source code. Software also embodies intellectual property in the machine-executable binary code that is produced from the source code.

Intellectual property is intangible property, but nonetheless it is valuable. Like other property, IP can be sold, traded, or simply held on to. It can also be stolen. Just as you would not want someone to take the fruits of your physical labor-for example, the boat that you built or the blueprints for that water-powered combustion engine-so you would not want someone to take the fruits of your mental labor. That constitutes theft. Similarly, if you paid for your television, it belongs to you and taking it is theft, just as taking software source code, software object code, or software patents that you purchased is theft, though not intellectual property theft.

At the beginning of each major part of this book is a short description of the content to be found in the chapters. You will also find two headings in each part-opening description: Objectives and Intended Audience. Under the Objectives heading you will find a very brief summary of the content of the chapters in the part. Under the Intended Audience heading you will find the type of reader who will find those chapters most valuable to her area of expertise. Feel free to read any part you want, though; no one is keeping track (at least that I am aware of).

Objectives

The objective of this book is to define ways of measuring and analyzing software intellectual property in order to value that intellectual property or to determine, in as precise and objective a manner as possible, whether software intellectual property has been misappropriated or infringed.

The objective of this part is to introduce you to the book, explain why the concepts described within it are important, and guide you through the chapters to understand which ones will be most useful to you.

Intended Audience

This book is intended for computer scientists, computer programmers, -corporate managers, lawyers, technical consultants and expert witnesses for litigation, and software entrepreneurs.

Computer scientists will be able to use the mathematical framework described in this book for measuring and comparing software IP. They will be able to use the mathematical definitions to quantify software IP, which until now has often been described in vague and sometimes conflicting terms. The mathematical framework described in this book can be expanded to take additional forms of software IP into account. This set of mathematical axioms and equations is certainly applicable to other aspects of software, and computer scientists should be able to find new and interesting applications of them.

Computer programmers will find the chapters on implementation and application of the mathematical theories particularly useful. They will be able to create programs to measure and analyze software IP. They will be able to optimize existing applications, and they will find new areas in which to apply these algorithms in programs for other purposes. This book will also give programmers an idea of the different forms of intellectual property, how each is embodied in their programs, how that IP is protectable, and how to avoid unintentional IP infringement or misappropriation.

Corporate managers will get a better understanding of the ongoing debate in Congress, in the courts, and among software developers and high-tech companies regarding software patents, a dispute that promises to affect the form and extent of protection of software intellectual property. These managers will also understand the legal concept of software IP and how the legal protections afforded to it can be used to protect their companies from competition and outright IP theft. Obviously the same concepts allow managers to avoid unintentional IP infringement or misappropriation and to understand how to protect themselves from false accusations of IP infringement and theft.

Lawyers will better understand how to detect software IP theft or measure software IP changes and how to best present their positive or negative findings in court. Because software IP issues are changing very quickly, with courts defining and redefining IP specifics and IP protections all the way to the U.S. Supreme Court, this book cannot impart in-depth legal knowledge. What it can do, however, is to define the basics and the precedents up until now. This book also gives lawyers the ability to understand the mathematical concepts involved in determining software IP in order to do a better job of choosing consultants and expert witnesses, contemplating the potential for success in any litigation, and interrogating and cross-examining expert witnesses for the opposing party.

Technical consultants and expert witnesses for litigation will be able to further their careers with a better understanding of how to detect software IP theft and infringement and how to best present their positive or negative findings in court. This book gives them a solid method for drawing conclu-sions about IP infringement and misappropriation that has been tested in court and in peer-reviewed studies. While many software IP cases still rely simply on a subjective contest of credibility between the expert witnesses of parties to litigation, it is my hope that the information in this book will create a more fair, standard, and objective means of reaching a decision in court. Those experts who understand the methods and tools described in this book will be in high demand.

Software entrepreneurs will be able to leverage all of the advantages just described for all of the other categories of readers and create new software programs and new businesses that incorporate the methods, algorithms, and implementations described herein to build successful businesses.

This book focuses on intellectual property and intellectual property rights in the United States. Much of this is applicable to other countries as well, though the specific laws and specific remedies may differ. In some places throughout the book I introduce information regarding other countries, but unless specifically stated otherwise, the legal issues refer to the practice and understanding of software intellectual property in the United States at the time of this writing.

About This Book

This book crosses a number of different fields of computer science, mathematics, and law. Not all readers will want to delve into every chapter. This is the place to start, but from this point onward each reader’s experience will be different. In this chapter I describe each of the parts and chapters of the book to help you determine which chapters will be useful and appealing for your specific needs and interests.

I should make clear that I am not a lawyer, have never been one, and have never even played one on TV. All of the issues I discuss in this book are my under-standing based on my technical consulting and expert witness work on nearly 100 intellectual property cases to date. My consulting company, Zeidman Consulting, has been growing over the years, and now the work is split between my employees and me. When I refer in the book to my experiences, in most cases that is firsthand information, but in other cases it may be information discovered and tested by an employee and related and explained to me.

In this book I also refer to forensic analysis tools that I have used to analyze software, in particular the CodeSuite tool that is produced and offered for sale by my software company, Software Analysis and Forensic Engineering Corporation (S.A.F.E. Corporation), and can be downloaded from the company website at www.SAFE-corp.biz. The CodeSuite set of tools currently consists of the following functions: BitMatch, CodeCLOC, CodeCross, CodeDiff, and SourceDetective. Functions are being continually added and updated. Each of these functions uses one or more of the algorithms described in later chapters.

Also, the CodeMeasure program uses the CLOC method to measure soft-ware evolution, which is explained in Chapter 12. It is also produced and sold by S.A.F.E. Corporation and can be downloaded from its own site at www.CodeMeasure.com.

Table 1.1 should help you determine which chapters will be the most helpful and relevant to you. Find your occupation at the top of the table and read downward to see the chapters that will be most relevant to your background and your job.

Part I: Introduction

The introduction to the book is just that-an introduction, intended to give you a broad overview of the book and help you determine why you want to read it and which chapters you will find most in line with your own interests and needs. This part includes a description of the other parts and chapters in the book. It also gives information and statistics about intellectual property crime, to give you an understanding of why this book is useful and important.

Part II: Software

In this part I describe source code, object code, interpreted code, macros, and synthesis code, which are the blueprints for software. This part describes these important concepts, which are well known to computer scientists and program-mers but may not be understood, or may not be understood in sufficient depth, by attorneys involved in software IP litigation. This part will be valuable for lawyers to help them understand how different kinds of software code relate to each other, and how these different kinds of software code can affect a software copyright infringement, software trade secret, or software patent case.

Part III: Intellectual Property

In this part I describe intellectual property, in particular copyrights, patents, and trade secrets. I have found that many of these concepts are unclear or only partially understood by many computer scientists, programmers, and corporate managers. In this part I define these terms in ways that I believe will be comprehensible to those with little or no legal background.

I also define the field of software forensics in this part. When I am asked to work on a case, there is sometimes confusion about the fields of software forensics and digital forensics. In some cases, engineers practicing digital forensics claim to practice software forensics and sometimes use the tools of digital forensics to attempt to draw conclusions about software IP, yielding incorrect or inconclusive results. Software forensics requires the specialized tools of the field and expertise in the field to extract relevant information from the tools, reach appropriate conclusions, and opine on those conclusions. In this part I offer definitions of the two fields. In fact, the definition of software forensics has, to this point, been somewhat vague. My explanation in this part will clarify the practice of software forensics, show how it fits into the field of forensic science, and differentiate it from digital forensics.

Part IV: Source Code Differentiation

This part describes source code differentiation, a very basic method of comparing and measuring software source code. Source code differentiation is especially useful for finding code that has been directly copied from one program to another and for determining a percentage of direct copying. While there are many metrics for measuring qualities of software, source code differentiation has some unique abilities to measure development effort, software changes, and software intellectual property changes that are particularly useful for determining software intellectual property value for such applications as transfer pricing calculations.

In this part I introduce the mathematics of the theory of source code differentiation and explain implementations of source code differentiation for programmers who want to understand how to implement it. I also describe the “changing lines of code” or “CLOC” method of measuring software growth that is based on source code differentiation, and I compare it to traditional methods like “source lines of code” or “SLOC.” I then discuss various applications of source code differentiation, though I believe that many more applications of this metric will be found in the future.

Part V: Source Code Correlation

This part starts by exploring the various methods and algorithms for “software plagiarism detection” that have been developed over the last few decades. I describe the origins of these methods and algorithms, and I explain their limitations. In particular, there have been no standard definitions and no supporting theory for this work, so I introduce the theory of source code correlation and definitions for characterizing source code. This characterization of software source code is practical for determining correlation and, ultimately, for determining whether copying occurred. While the theory and definitions are broad enough to be useful in various areas of computer science, they are particularly valuable in litigation.

In this part I also describe practical implementations of the theory for those programmers who want to understand how to implement the algorithms. Additionally, I describe applications of the theory in the real world. This part is highly mathematical, though the chapter on source code characterization will be useful for lawyers in understanding how elements of software source code can be categorized, how these various elements relate, and how the elements can affect a software copyright infringement, software trade secret, or software patent case.

Part VI: Object and Source/Object Code Correlation

In this part I introduce the theory and mathematics of object code correlation, which is used to compare object code to object code to find signs of copying. I also introduce the theory of source/object code correlation, which is used to compare source code to object code to find signs of copying. Both of these correlation measures are helpful before litigation when there is no access to source code from at least one party’s software. I also describe practical imple-mentations of the theory for those programmers who want to understand how to implement these correlation measures, and I describe applications of the methods and algorithms in the real world.

Part VII: Source Code Cross-Correlation

In this part I introduce the theory and mathematics of source code cross-correlation, which is specifically used to compare functional source code statements to nonfunctional source code comments to find signs of copying. This correlation measure is effective, in certain cases, for finding copied code that has been disguised enough to avoid detection with one of the other correlation measures. I describe some ways of effectively implementing code to measure source code cross-correlation for those programmers who want to understand how to implement this measure, and I describe applications of source code cross-correlation in the real world.

Part VIII: Detecting Software IP Theft and Infringement

All of the correlation measures described in previous parts are useful for detecting software intellectual property theft; however, expert review is still required. Previously developed algorithms often produced a measure that claimed to show whether code was copied or not. In reality, a mathematical measure in and of itself is not enough to make this determination, and that is one of the problems with previous work in this area. In this part I describe detailed, precise steps to be taken once correlation has been calculated. These steps are as important to the standardization and objectivity required for determining intellectual property theft and infringement as are the various correlation measurements described in the previous parts.

Part IX: Miscellaneous Topics

This part covers areas that have come up in my involvement with intellectual property litigation. These subjects were also suggested by some of the experi-enced reviewers of this book who felt they deserved discussion. The issues described in this part often arise in software intellectual property cases and are also important for code developers and managers to understand. In particular, I discuss procedures for implementing a software clean room, I explain open source code, and I describe the Digital Millennium Copyright Act.

Part X: Past, Present, and Future

The topics discussed in this book are cutting-edge, and I find them to be very interesting and exciting. A lot of work remains to be done, including extending the theories, advancing the mathematics, standardizing the definitions, and promoting the methodologies. In this part I discuss what has been done to date, speculate on areas of future research that build on the concepts in this book, and look toward new applications in various aspects of law and computer -science.