Research

Python Packages and Project List:

1. PYDAR: Package to access, download, view, and manipulate Cassini RADAR data from the image flybys on Saturn’s moon Titan

2. centerline-width: Package using NetworkX and Voronoi diagrams to find the centerline and width of rivers from coordinates of the right and left bank

3. Interplanetary Field Enhancements: Automated Identification of Events and Search for the Source of the Event

4. Hydra: Named Entity Resolution in Python

PYDAR

A Python package to access, download, view, and manipulate Cassini RADAR images in one place

  • Find relevant flyby observation numbers/IDs for a feature, range of latitude/longitudes (or specific latitude/longitude), or a time range (or specific time)
  • Use flyby observation numbers/IDs to retrieve flyby observation data (.FMT, .TAB, .LBL, .IMG) from SBDR and BIDR data files by default
  • Access specific observation data from AAREADME and .LBL readme information
  • Display PDS image retrieved for flyby observation

PyPi pip install at pypi.org/project/pydar/

 Quickstart: PYDAR
All Cassini data for Titan is retrieved based on flyby observation numbers or IDs, so relevant flybys can be found based on time range, latitude/longitude position, or a known feature name
import pydar
feature_name_example = "ontario lacus"
flyby_ids = pydar.retrieveIDSByFeatureName(feature_name=feature_name_example)

Returns a dictionary of flyby IDs (and their relevant segments) that Ontario Lacus could be found from: {'T7': ['S01'], 'T36': ['S03'], 'T39': ['S06', 'S05', 'S01', 'S04'], 'T48': ['S04'], 'T49': ['S01'], 'T50': ['S02'], 'T55': ['S01', 'S03'], 'T56': ['S01'], 'T57': ['S01', 'S02'], 'T58': ['S01'], 'T59': ['S01'], 'T65': ['S04', 'S01', 'S05', 'S02', 'S03'], 'T71': ['S01'], 'T95': ['S03'], 'T98': ['S01', 'S04']}

The relevant data files can be downloaded for any combination of these flyby IDs and segment numbers. For example, flyby ID ‘T65’ and segment ‘S01’ at resolution ‘D’ for 8 pixels/degree

# Extract Flyby Data Files to pydar_results/ directory
pydar.extractFlybyDataImages(flyby_id='T65',
				resolution='D',
				segment_num="S01")

Note: extractFlybyDataImages() only needs to be run once for each flyby to retrieve new data but will take some time to download

# Display all Images in pydar_results/ directory
pydar.displayImages(image_directory="pydar_results/CORADR_0211_V03_S01")

centerline-width

Find the centerline and width of rivers based on the latitude and longitude positions from the right and left bank

  • Convert raw data from Google Earth Pro to CSV
  • Find centerline and width of river

PyPi pip install at pypi.org/project/centerline-width/

Quickstart: centerline-width

The core of centerline-width works with a .csv file of the left and right bank latitude/longitudes. Starting with Google Earth Pro, two .kml must first be translated to a single .csv file

import centerline_width
centerline_width.extractPointsToTextFile(left_kml="left_bank.kml",
					right_kml="right_bank.kml",
					text_output_name="river_coordinates_output.txt")
centerline_width.convertColumnsToCSV(text_file="river_coordinates_output.txt")

Then, to run the centerline-width functions, generate a river object from the river_coordinates_output.csv

river_object = centerline_width.riverCenterline(csv_data="river_coordinates_output.csv")

To plot the centerline, run the plotCenterline() function from river_object created

river_object.plotCenterline()

To plot the width of the river at intervals along the bank, run plotCenterlineWidth

While apply_smoothingremove_intersections, and display_true_centerline are optional, they are recommended to generate a minimal width diagram

river.plotCenterlineWidth(apply_smoothing=True, remove_intersections=True, display_true_centerline=False)

Interplanetary Field Enhancements (IFEs)

Automating the Search for IFEs in ACE Magnetometer Data

Looking for Interplanetary Field Enhancements (IFEs) (Russell et al. 1985a) in ACE magnetometer data (GSE coordinates). IFEs were first identified in associated with the passage of an asteroid with the Venusian orbit and are believed to be the result of charged dust interactions with the flowing solar wind. The evolution and geoeffectiveness of IFEs is still an area of active research, so fast and objective identification of IFEs at 1 AU is important.

Selection Criteria (Lai et al. 2017):

1. Total Magnetic field enhancement > 25% (relative to ambient |B|)

2. Duration of enhancement > 10 minutes

3. Current sheet is present at or within the peak of |B|

From the original ACE magnetometer data, I constructed a Python script to automatically identify and generate graphs of potential events based on these criteria. This greatly improved the speed of finding potential events since it had been previously done by hand. This script can search through a year’s worth of data automatically and return all potential events with generated graphs and timestamps.

Example of an identified event:

 

Script available with examples and instructions on Github

Identifying the Possible Dust Source Correlated with an IFE

Interplanetary Field Enhancements (IFEs) were first discovered within the Venusian orbit and were believed to be generated by charged dust mass-loading interplanetary magnetic fields, originally the asteroid 2201 Oljato (Russell 1987). However, the dust source hypothesis for IFEs remains a controversial stance. This program attempts to correlate IFEs measured near-Earth in the solar wind to small bodies which could be a source of dust.

A strong candidate dust source will be a small body that has an orbit inclination close to the XY plane (GSE) of the spacecraft and be the region around the time the IFEs were found (i.e. have a small phase difference) for multiple orbits. Because of the large gyroradius of the charged dust, the cloud will travel approximately radially from the source.

The program first identifies possible dust sources for each event and then compares subsequent periods of the small bodies identified to find the percentage of the time IFEs are again seen.

Identification is broken into steps that the Python script will allow to attempt to identify the candidate dust sources:

1. Find all small bodies nearby in phase and inclination

2. Find orbital period

3. Check all subsequent periods for IFEs

4. Compare how often subsequent periods produce an IFE

Subsequent statistically analysis will be performed through the entire orbit of promising dust source candidates on how the frequency of IFEs changes at different points in the body’s orbit (see Russell 1987).

Data was collected from the NASA JPL Asteriod team’s NeoWs (Near Earth Object Web Service) API and pre-processing and identification was done through a Python script I wrote.

Script available with examples and instructions on Github

Hydra

University of Colorado Boulder Honors Thesis – Hail Hydra: Named Entity Resolution, Extraction, and Linking of Lexically Similar Names

“Words, words, words (Hamlet 2.2 18) Characters and ideas in text are represented by names. A casual reader would have no trouble understanding that a passing reference to Mr. Holmes, Mr. Sherlock Holmes, Sherlock Holmes, and Holmes all trace back to the world’s most famous detective. Names are often shortened or rearranged with common abbreviation or elaborate titles. Each version of a character’s name can be understood as a single head on a multi-headed hydra, all tracing back to the same body. Raw text analysis requires more literary context about how English is structured and how words in a sentence interact to generate the most accurate named entities possible. Many intelligent-dependency parsers and natural language processing systems study text without accounting for how dynamic language can be. This thesis considers the entire body of a piece of literature to identify and relate entities within the same text, regardless of the fluid nature of the exact reference to an entity in literature. Once an entity has been identified, lexically similar names, which refer to the same character, can be linked together to form a global named entity that represents all forms of the named entity referenced in the text. By utilizing raw text as opposed to labeled corpus, this thesis will generate named entities from the text.”

Examples of the Global Named Entities Found:

Captain Rollo Bickersteth of the Coldstream (My Man Jeeves)

Wicked Witch of the West (Wonderful Wizard of Oz)

Sydney Cecil Vivian Montmorency (Little Princess)

Superior of the Academy of the Presentation of the Blessed Virgin (Love in the Time of Cholera)

General Manager of the River Company of the Caribbean (Love in the Time of Cholera)

Eleventh Edition of the Newspeak Dictionary (1984)

Networks of Interactions

Once the text has been tagged, and the global named entities have been identified, I used this script to generate a network of interactions to show how characters and ideas in the text are interacting, including both frequency and sentiment.

The Wonderful Wizard of Oz (Baum)

More networks of interactions for specific texts can be found here

Tagging text for Pronouns and Proper nouns

One morning, when <Gregor Samsa>_n0 woke from troubled dreams, <he>_p0 found <himself>_p1 transformed in <his>_p2 bed into a horrible vermin. <He>_p3 lay on <his>_p4 armour-like back, and if <he>_p7 lifted <his>_p5 head a little <he>_p8 could see <his>_p5 brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover <it>_p9 and seemed ready to slide off any moment. <His>_p10 many legs, pitifully thin compared with the size of the rest of <him>_p11, waved about helplessly as <he>_p12 looked.

<Scarlett O'Hara>_n0 was not beautiful, but men seldom realized <it>_p0 when caught by <her>_p1 charm as the <Tarleton>_n1 twins were. In <her>_p2 face were too sharply blended the delicate features of <her>_p3 mother, a <Coast>_n2 aristocrat of French descent, and the heavy ones of <her>_p3 florid Irish father.

Gender Name Classifier (Decision Trees)

To better identify features of interactions, I trained a decision tree to determine the gender of a name, which was updated in a later step to verify the gender based on gendered honorifics like ‘Mr’ or ‘Queen’.

The name ‘Atticus’ is most likely Male

Odds: Female (0.215384615385), Male (0.784615384615)

The name ‘Ishamel’ is most likely Male

Odds: Female (0.4), Male (0.6)

Identify Main Characters

In order to get a summary of the text, combining this information I was able to generate a simple summary of the text and determine who the main character or who the text was focusing on. This distinction is particular important for first person texts, like Sherlock Holmes, where the story is narrated by Dr. Watson, but is focused on the titular character.

The Wonderful Wizard of Oz (Baum)

IS FIRST PERSON TEXT: True

CHARACTER OF INTEREST: [(‘Dorothy’, 345)]

ADDITIONAL TOP CHARACTERS OF INTEREST: [(‘Wise Scarecrow’, 224), (‘Tin Woodman’, 180), (‘Cowardly Lion’, 176), (‘Wonderful City of Oz’, 159), (‘Wicked Witch of the West’, 126)]

Sherlock Holmes (Doyle)

IS FIRST PERSON TEXT: True

CHARACTER OF INTEREST: [(‘Mister Sherlock Holmes’, 453)]

ADDITIONAL TOP CHARACTERS OF INTEREST: [(‘Dr Watson’, 80), (‘City of London’, 51), (‘Mr Lestrade of Scotland Yard’, 48), (‘Mr John Turner’, 40), (‘Mr James Windibank’, 38)]

More examples can be found here

Differences in Sentiment between Female and Male Characters during the course of a story

With the gender and characters identified, I automated the generation of graphs for any novel with statistically significant differences in the sentiment differed between male and female characters. From my list of novels, this left two texts: Princess of Mars and The Scarlet Letter where in both cases the female characters were statistically more likely to associated with negative words.

As an example, I have included the polarity for Princess of Mars below.

Script available with examples and instructions on Github