METHOD AND APPARATUS FOR SOCIOLOGICAL DATA ANALYSIS

US 2009 287 685A1

drawing #0

Show all 113 drawings

A method to enable improved analysis and use of sociological data, the method comprising identifying causal relationships between a plurality of documents, identifying a plurality of characteristics of a communication, including a modality used, actors involved, proximate events of relevance, and enabling a user to query based on available characteristics.

PatentSwarm provides a collaborative workspace to search, highlight, annotate, and monitor patent data.

Start free trial Sign in

Tip: Select text to highlight, annotate, search, or share the selection.

Claims

1. A method to enable improved analysis and use of sociological data, the method comprising:
identifying causal relationships between a plurality of documents;
identifying a plurality of characteristics of a communication, including a modality used, actors involved, proximate events of relevance, and a tone used by an actor in the communication; and
enabling a user to query based on available characteristics.

Show 9 dependent claims

11. A method comprising:
identifying causal relationships between a plurality of documents;
identifying a plurality of characteristics of a communication, including a modality used, actors involved, and proximate events of relevance;
enabling a user to query based on available characteristics; and
identifying the modality of the communication when displaying a discussion.

Show 4 dependent claims

16. A method comprising:
identifying causal relationships between a plurality of documents;
identifying a plurality of characteristics of a communication, including a modality used, actors involved, and proximate events of relevance;
enabling a user to query based on available characteristics;
evaluating actor behavior over time, to create a threshold behavior for the actor; and
identifying deviations from the threshold behavior.

Show 4 dependent claims

Description

This application is a continuation of U.S. patent application Ser. No. 11/497,199, filed Jul. 31, 2006, the full disclosure of which is herein incorporated by reference for all purposes.

U.S. patent application Ser. No. 11/497,199, filed Jul. 31, 2006, is a continuation-in-part of U.S. patent application Ser. No. 10/358,759, filed Feb. 4, 2003, now U.S. Pat. No. 7,143,091, which claims priority to Provisional Patent Application Ser. No. 60/354,403, filed Feb. 4, 2002, the full disclosures of which are herein incorporated by reference for all purposes.

FIELD OF THE INVENTION

This application relates to data analysis, and more particularly to a sociological data analysis.

BACKGROUND

This application addresses an invention to substantially improve the complex effort of responding to a discovery request, and the demands of performing an investigation. The two halves, which are often performed in parallel we will call review and investigation respectively.

Common legal practice in responding to a discovery request often requires that data pertinent to a matter should be reviewed for relevance and privilege. A common review method is when reviewers annotate items with one or more tags indicating how the content should be categorized. Based on these reviewer categorizations, each item either produced to the counter party, or noted in a privilege log but (generally) not produced, or nor produces because of irrelevance to the discovery request. The traditional process of handling a discovery request is time and labor intensive, and as a result has a high cost. Furthermore, it is extremely difficult to obtain consistent and accurate results amongst reviewers which is a significant problem in itself, but especially when there are a large number of reviewers working to meet a discovery request.

The continuing increase in the amount of corporate data that is necessary to reasonably meet a discovery request is creating an extra burden on the existing art. Therefore it has become common practice to use keyword culling to reduce the number of items reviewed. However, keyword culling is extremely inaccurate and other well-known automated categorization techniques have therefore been attempted. Unfortunately, these automated categorization methods are usually overly simplistic and can introduce real risks. Relevance to a discovery request cannot be judged only by the presence of keywords or simple analyses of the data. For example, consider the simple case of an email that in its entirety reads: Yes, let's proceed, which could be an authorization to commit fraud or something that is completely innocuous. Nor can relevance be adjudged accurately by statistical categorization methods, since very slight differences in content can make the difference on whether an item is produced or not produced; matters hinging on jurisdictional issues are one of many excellent examples of this.

To improve upon the existing art in a realistic and comprehensive manner, many factors must be taken into account, including:

Requirements for accuracy and completeness are very strict. The consequences of failing to remove material containing confidential or privileged material may be severe. The courts also frown upon dumping large numbers documents that are non-responsive to the original request, and can even impose sanctions on this basis.

The categorization requirements are varied and can include hard constraints such as conformance to relevant date ranges or custodial ownership, as well as broad references to a general topicand all points on the continuum in between.

Corpora very often contain multiple foreign languages.

It is very difficult, and sometimes nearly impossible, to quickly and effectively train large numbers of document reviewers on how to interpret detailed and often highly industry specific data.

The task of document review is an extraordinarily tedious one, and reviewers can easily become bored and have their attention drift.

It is therefore necessary to have an objective and rapid means of assessing reviewer accuracy and providing feedback.

Large data files such as spreadsheets or dumps of database contents can confound most automated categorization techniques.

) Short format items such as email responses or IMs can be sufficiently lacking in content that they require other related itemssuch as those identified by discussions, in order to accurately assign any meaning to them.

Large corpora are heterogeneous and distributed over items of many different types, from emails and different kinds of short message formats, to typical office and business documents to very large data files.

The invention document herein, and in the parent application accounts for all of these factors in order to help users meet the stringent requirements of a discovery request as efficiently and effectively as possible.

A first step of handling a discovery request often involves an investigative effort where the party served with a discovery request is interested in making its own conclusions about the matter at hand. It is often important for both review and investigation tasks to be done in parallel for the simple reason that the investigation effort may in some instances dictate that a case should simply be dropped, or that an attempt should be made to settle it based on bad fact patterns. While review and categorization of individual items is necessary in order to determine which items must ultimately be produced, it is a much different task than trying to analyze the collective meaning of the data.

Analyzing corporate data for its meaning can quickly provide information about exactly what happened, and who might be important to an investigation effort. In order to support the investigative task, the present invention provides visualization, analysis, and a powerful query engine for many dimensions of actor behavior, with special attention given to how these different dimensions change over time, and may be correlated to one another. In addition, factors such as the emotive tones present in communication, and the apparent avoidance of written communication media are analyzed and visualized.

SUMMARY OF THE INVENTION

A method to enable improved analysis and use of sociological data, the method comprising identifying causal relationships between a plurality of documents, identifying a plurality of characteristics of a communication, including a modality used, actors involved, proximate events of relevance, and enabling a user to query based on the available characteristics.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a flowchart displaying one embodiment of a high-level system overview.

FIG. 1B is a screenshot of one embodiment of the main navigation window.

FIG. 2 is a diagram of one embodiment displaying key concepts related to sociological data mining.

FIG. 3A is a screenshot of one embodiment of a user interface design to represent discussions.

FIG. 3B is a screenshot of one embodiment of timeline shaded in vertical sections by color to indicate nights and days.

FIG. 3C is a screenshot of one embodiment of a vertical timeline color display indicating start and end of relevant time interval.

FIG. 3D is a screenshot of one embodiment of a discussion showing a PDA icon.

FIG. 3E is a screenshot of one embodiment of an audit trail.

FIG. 3F a screenshot of one embodiment of a cluster header at the discussion and message levels.

FIG. 3G is a screenshot of one embodiment of displaying Cluster contents.

FIG. 3H is a diagram of one embodiment of how a user can extend the query to search for any potential variance of a document.

FIG. 4A is a screenshot of one embodiment of an Actor Information Report.

FIG. 4B is a screenshot of one embodiment of an Actor Information Report.

FIG. 4C is a screenshot of one embodiment of an Actor Information Report.

FIG. 4D is a screenshot of one embodiment of an Actor Information Report.

FIG. 4E is a diagram of one embodiment of the different components available in an Actor Information Report.

FIG. 5 is a flow chart of one embodiment of the mechanism used to determine whether an individual control is set for an item.

FIG. 6A is a diagram of one embodiment of the behavior of a discussion when it contains items with different control settings.

FIG. 6B is a flow chart of one embodiment of how adding new data affects existing discussions.

FIG. 7 is a screenshot of one embodiment of a time-lapsed presentation.

FIG. 8 is a screenshot of one embodiment of a thumbnail presentation options.

FIG. 9 is a diagram of one embodiment of different event types.

FIG. 10 is a screenshot of one embodiment of an Item Report.

FIG. 11 is a diagram of different types of communication graphs.

FIG. 12 is a screenshot of one embodiment of a communication graph displaying discussions.

FIG. 13 is a screenshot of one embodiment of a communication graph displaying the capability of performing graphical querying.

FIG. 14 is a flowchart of one embodiment of the use of graphical query to get query results or query controls.

FIG. 15 is a screenshot of one embodiment of a communication graph displaying discussions annotated with a phone icon to represent a phone call.

FIG. 16 is a screenshot of one embodiment of a communication graph displaying an icon for a N way phone call.

FIG. 17 is a screenshot of one embodiment of a communication graph displaying discussions annotated with phone icons that designate whether phone records are available for the phone event or not.

FIG. 18 is a screenshot of another embodiment of a communication graph displaying discussions annotated with phone icons that designate whether phone records are available for the phone event or not.

FIG. 19 is a screenshot of another embodiment of a communication graph displaying discussions annotated with phone icons and mouse over capability that provides relevant additional information about the phone call.

FIG. 20 is a screenshot of one embodiment of a communication graph displaying discussions using different line styles and phone icons to designate whether a particular discussion has a phone event and whether phone records are available for the event.

FIG. 21 is a screenshot of another embodiment of a communication graph displaying discussions using different line styles and to designate whether a particular discussion has a phone event and whether phone records are available for the event.

FIG. 22 is a screenshot of another embodiment of a communication graph displaying only the communications involving an actor selected by the user.

FIG. 23 is a flow chart of one embodiment of different instruction types.

FIG. 24 is a screenshot of one embodiment of the graph of instruction relaying.

FIG. 25 is a screenshot of another embodiment of the graph of instruction relaying displaying one embodiment of highlighting direct instructions by the use of a darker line.

FIG. 26 is a screenshot of another embodiment of the graph of instruction relaying displaying another embodiment of highlighting direct instructions by the use of an icon.

FIG. 27 is a screenshot of another embodiment of the graph of instruction relaying displaying rings for mere forwards and explicit instructions.

FIG. 28 is a screenshot of one embodiment of displaying actor proximity for both professional and personal communications.

FIG. 29 is a screenshot of one embodiment of displaying emotive content for communications between actors.

FIG. 30 is a screenshot of another embodiment illustrating the capability of graphical query.

FIG. 31 is a screenshot of one embodiment of showing changes in tone over time.

FIG. 32 is a screenshot of one embodiment of displaying actor proximity by the number of used contact channels.

FIG. 33 is a screenshot of one embodiment of a graph based on discussions.

FIG. 34 is a screenshot of one embodiment of communication in the context of the organization chart displaying a missing link and communication around the organization chart.

FIG. 35 is a screenshot of one embodiment of communication in the context of the organization chart displaying communication boundaries.

FIG. 36 is a screenshot of one embodiment of a graph displaying the spread of information.

FIG. 37 is a screenshot one embodiment of sequentially displaying mixed type discussions.

FIG. 38 is a screenshot of one embodiment of displaying probability of unrecorded event.

FIG. 39 is a screenshot of one embodiment of displaying relevant comparative data about discussions including discussion length and the number of discussions that ended in a call me event.

FIG. 40 is a screenshot of one embodiment of sequentially displaying mixed type discussions along with pivotal events.

FIG. 41 is a screenshot of one embodiment of a tonal analysis of Actor communication.

FIG. 42 is a screenshot of one embodiment of a tonal analysis of actor to actor group communication.

FIG. 43 is a diagram of one embodiment of how a sentence is analyzed for tonal content.

FIG. 44 is a screenshot of one embodiment of a tonal analysis of actor communication illustrating an icon for quoted content.

FIG. 45 is a screenshot of one embodiment of an analysis of actor communication illustrating emotive content.

FIG. 46 is a screenshot of one embodiment of an analysis of actor communication illustrating the gauge for negative tonal content.

FIG. 47 is a screenshot of one embodiment of an analysis of actor communication illustrating the clustering presentation method.

FIG. 48 is a screenshot of one embodiment of emotive content.

FIG. 49 is a screenshot of one embodiment of Actor Heartbeat.

FIG. 50 is a screenshot of one embodiment of potential tampering with backups of archival formats.

FIG. 51 is a screenshot of one embodiment of document lifecycle view.

FIG. 52 is a screenshot of one embodiment of document lifecycle view in comparison to ad hoc workflow.

FIG. 53 is a diagram of one embodiment of how a burst of activity is determined for a document lifecycle.

FIG. 54 is a screenshot of one embodiment of an Actor Information Report.

FIG. 55 is a diagram of one embodiment of how a document's ancestral lineage is determined.

FIG. 56 is a screenshot of one embodiment of privilege log.

FIG. 57 is a screenshot of one embodiment of intersection of review decisions and clusters.

PatentSwarm provides a collaborative workspace to search, highlight, annotate, and monitor patent data.

Start free trial Sign in