Time Series Analysis in Stingray.jl
From Stargazing Dreams to Code Stargazing:)
Ever since childhood, I have been passionate about the mysteries of the universe. Over time, that passion grew into something more technical — not just looking at the stars, but wanting to understand their signals.
This summer, through Google Summer of Code 2025 (GSoC), I had the opportunity to contribute to Stingray.jl, a Julia library for time series analysis in astronomy. What follows is both a story of my personal journey and an exploration of the science and code I worked on.
The Beginning – Switching Languages and First Steps
When the organization was announced, I immediately started exploring Stingray (the Python version). I wanted to understand the structure, so I made my first PRs there.
Soon, I understood the Stingray a bit, then I started working on Stingray.jl, the Julia port of Stingray initiated by Aman Pandey in 2022. That meant learning Julia from scratch! Thankfully, Julia was similar to Python in many ways, so I picked it up quickly.
During my early testing, I noticed outdated dependencies — for example, some packages were only compatible with Julia 1.7, while the project was moving towards Julia 1.11. My first clumsy PR (on power_colors) wasn’t great, will tell you in a bit,t but it gave me valuable lessons.
Later, I worked on updating fourier.jl
. This was also my first time seriously using structs in Julia. My mentor @fjebaker guided me on abstract methods and struct design, while @matteobachetti helped me understand what kind of astronomical data the project needed.
At first, I was confused about telescopes and their instruments. To learn, I went to NASA HEASARC documentation, where I discovered FITS data (Flexible Image Transport System) — the standard format for astronomical data. That’s when things started making sense.
My First Big Task – Reading Event Data
@fjebaker opened a milestone and assigned me the readevent
task. At the same time, I became good friends with fellow contributor Jawd Ahmad. We often helped each other understand the codebase.
I implemented the readevent
function with a small struct called EventList
something like :
struct DictMetadata
headers::Vector{Dict{String,Any}}
end
struct EventList{T}
filename::String
times::Vector{T}
energies::Vector{T}
metadata::DictMetadata
end
struct DictMetadata
headers::Vector{Dict{String,Any}}
end
struct EventList{T}
filename::String
times::Vector{T}
energies::Vector{T}
metadata::DictMetadata
end
Initially, it was just reading .fits
files (not .evt
Yet). After mentor suggestions and fixes, my first PR was finally merged into Stingray.jl. That moment felt amazing, it was the first pr to be merged into open source.
Recipes and Lightcurves
The next milestone was implementing recipes (plotting methods). Earlier, my worst PR power_colors
involved directly plotting arrays inside functions (yes, it was a mess 😅). This time, I studied RecipesBase.jl
and realized how it allows clean plotting through micro-functions.
I implemented lightcurve recipes, so one can now simply call plot(lc)
in the Julia REPL.
At this stage, I was still generating my own FITS files with FITSIO.jl
. While testing, I hit a table HDU error, which led me to file issues (and even attempt fixes) in that library.
Proposal Time
By then, I had collected dozens of reference links in my notepad and shaped them into my official GSoC proposal.
On May 8th, I got selected. That was one of the happiest moments of my life! 🎉
We had our first meeting with mentors, where we discussed our next plans. From then on, meetings were held almost every Friday at 3 PM (Rome time).
Building Functions Step by Step
Event Functions
I refined readevent
and created better struct definitions. @fjebaker even made a PR to demonstrate how things are usually structured in Julia — it was extremely helpful.
The final results had these functions:
event1 = readevents("ni1050360108_0mpu7_cl.evt", load_gti=true, sort=true)
ev_lowE = filter_energy(x -> x >= 20 && x < 500, event1)
ev_highE = filter_energy(x -> x >= 500 && x <= 1500, event1)
println("Low energy events: ", length(ev_lowE))
println("High energy events: ", length(ev_highE))
#Output:
Low energy events: 28841910
High energy events: 2941370
From here, as you can see, we have a readvent function that has some parameters, and you make a read
.evt, fits, and etc files with and we have one of the good function filter_energy,get_gti_info, and many more:[reference PRs: readevent evolved to eventlist evolved to events]
Finally, we have a function that reads GTI info and all other telescope metadata.
Reference [ni1050360108_0mpu7_cl.evt file ], so you will see two types of files there :
.ufa
Files are unfiltered, calibrated NICER event files containing all raw events, including bad or non-X-ray events, mainly for initial analysis.
.cl
Files are clean, calibrated event files that have been screened to remove bad or non-X-ray events, ready for scientific analysis.
For more detailed information on NICER data processing and file formats, you can refer to the official HEASARC documentation: HEASARC.
Lightcurve Functions:
LightCurve{T}
Main structure for binned time series with uncertainties and properties
lc = create_lightcurve(eventlist, 1.0) # 1-second bins
Key Functions:
Light Curve Creation
# Basic usage
lc = create_lightcurve(eventlist, binsize)
# With filtering
lc = create_lightcurve(eventlist, 0.1;
tstart=1000.0, tstop=2000.0,
energy_filter=(0.5, 10.0))
# Custom filtering (recommended)
filtered_events = eventlist |>
ev -> filter_time(t -> t > 1000.0, ev) |>
ev -> filter_energy(e -> e < 10.0, ev)
lc = create_lightcurve(filtered_events, 1.0)
Error Handling
# Set Poisson errors (default)
set_errors!(lc)
# Set custom Gaussian errors
set_errors!(lc, [0.5, 1.2, 0.8, 1.1])
# Recalculate errors after modifying counts
lc.counts .+= background_correction
calculate_errors!(lc)
Rebinning
# Rebin to larger time bins
lc_10s = rebin(lc_1s, 10.0)
plot(events)
now generates a basic light curve plot, and built-in options allow filtering and annotating the data (e.g., highlighting Good Time Intervals or applying energy/time filters). The snippet below shows a NICER light curve with peaks at regular intervals.USING PLOTS [PLOTTING EXAMPLES FOR LIGHTCURVE]
using Plots
events = readevents("ni1200120104_0mpu7_cl.evt", load_gti=true, sort=true)
output
Found GTI data: 16 intervals
GTI time range: 1.3253976595089495e8 to 1.3261337476368216e8
EventList with 21244574 times and energies
simple plot
plot(events)
Light curve with time range filtering
plot(events, tstart=1.32540e8, tstop=1.32546e8)
lightcurves show_gtis and show_btis
plot(events, show_btis=true, bti_alpha=0.4,show_gtis=true, gti_alpha=0.3)
These capabilities come directly from the new lightcurve recipes. The examples show how a few keywords (like axis_limits
, show_gtis
, show_btis
, etc.) can dramatically change the plot appearance without extra coding, making Stingray.JL is much more user-friendly for exploratory plotting.
Midterm Success
By midterm evaluation, I had multiple PRs merged, including Lightcurve recipes. I even optimized performance: initially, plotting a huge event file (34 million events, ~1.1 GB) took 1 minute 33 seconds. Using Julia broadcasting and matrix operations{suggested by @fjebaker}, I reduced it to 11 seconds.
And I have passed the midterm evaluation
Moving towards GTI (Good Time Intervals) And BTI (Bad Time Intervals)
Refer to my previous blog
After discussions with @matteobachetti, I implemented GTI interfaces:
apply_gtis() (LightCurve version)
- Segments light curves by GTIs, creating independent LightCurve objects suitable for:
- Independent periodogram calculations
- Variability analysis per observation segment
- Maintaining uniform sampling requirements
create_filtered_lightcurve()
- Internal function that creates filtered LightCurve objects while preserving metadata and recalculating statistical properties.
- Data Gap Filling Function
fill_bad_time_intervals!()
- Fills short gaps between GTIs with synthetic events to maintain temporal sampling continuity. This is used when:
- Periodogram analysis requires uniform sampling
- Short data gaps would introduce artifacts
- Statistical methods assume continuous time series
- These functions ensure we only analyze valid parts of data (avoiding bad intervals).
Diving into Spectral Analysis :)
Next came the Power Spectrum and Averaged Power Spectrum implementations.
Power Spectrum
This decomposes lightcurves into frequencies (like listening to the rhythm of a star).
Averaged Power Spectrum
Functions:
events[unbinned]
events = readevents("ni1200120104_0mpu7_cl.evt", load_gti=true, sort=true)
ps = Powerspectrum(events, norm="leahy")
avg_ps = AveragedPowerspectrum(events, 1024.0, norm="leahy", dt=0.1)
output: data [ik redunddant :)]
lightcurve[binned]
lc = create_lightcurve(events, 0.1)
ps_lc = Powerspectrum(lc, norm="leahy")
ps_avg_lc = AveragedPowerspectrum(lc, 1024.0, norm="leahy")
ps_avg_lc = AveragedPowerspectrum(lc, 1024.0, norm="leahy")
output the same type as the above one:)
println(ps.df)
println(avg_ps.df)
println(ps_lc.df)
println(ps_avg_lc.df)
Output:
1.3662135391761732e-5
0.0009765625
1.3662135391761732e-5
0.0009765625
println(ps.m)
println(avg_ps.m)
println(ps_lc.m)
println(ps_avg_lc.m)
Output:
1
7
1
7
Functions with plots :
Basis setup:
using Plots
events = readevents("ni1200120104_0mpu7_cl.evt", sort=true)
lc = create_lightcurve(events, 0.1)
Plotting:
plot(events, segment_size=256.0, dt=0.01, norm="frac", show_noise=true)
plot(events, segment_size=256.0, show_noise=true, axis_limits=[1,1/10])
# Create the power spectra objects
ps2 = AveragedPowerspectrum(events, 256.0; dt=0.01, norm="frac")
ps3 = AveragedPowerspectrum(events, 256.0; dt=0.01, norm="leahy")
ps4 = AveragedPowerspectrum(events, 256.0; dt=0.01, norm="abs")
plot([ps2, ps3, ps4],
labels=[ "Fractional", "Leahy", "Absolute"],
colors=[ :red, :green, :orange],
linewidth=2.0,
show_noise=true)
ps = AveragedPowerspectrum(events, 256.0, norm="frac")
plot(ps, show_noise=true, meanrate=ps.mean_rate)
plot(ps, show_noise=true, show_errors=true, subtract_noise=true, meanrate=ps.mean_rate)
plot(ps, freq_mult=true, show_noise=true, meanrate=ps.mean_rate)
ps1 = AveragedPowerspectrum(events, 128.0, norm="frac")
ps2 = AveragedPowerspectrum(events, 256.0, norm="frac")
ps3 = AveragedPowerspectrum(events, 512.0, norm="frac")
plot([ps1, ps2, ps3], labels=["128s", "256s", "512s"])
While building this function, I faced trouble in gti since if u read .evt files from previous functions, it was excluding edge gtis, so @matteobachetti suggested to use generate_indices_of_segment_boundaries_binned and this generate_indices_of_segment_boundaries_unbinned
This was a key feature to handle GTI correctly, thank Matteo
Cross Spectrum – A Big Milestone
The most challenging task was implementing Cross-Spectrum, which compares two lightcurves to study phase lags and time lags.
At first, I faced countless issues: GTI misalignment, event loss, and overflow errors. My mentor reminded me: “You are not reading complete .evt
files”. Once I corrected that, things started working.
Extended FITSMetadata struct with timing keyword fields:
- mjd_ref: MJD reference time (supports both MJDREF and MJDREFI+MJDREFF)
- time_zero: Time zero offset (TIMEZERO)
- time_unit: Time unit conversion factor (TIMEUNIT)
- time_sys: Time system (TIMESYS)
- time_pixr: Time pixel reference (TIMEPIXR)
- time_del: Time resolution/bin size (TIMEDEL)
The implementation follows the HEASARC timing tutorial specifications and handles various edge cases commonly found in X-ray astronomy FITS files.
I also thought units were mismatched, but it turned out Julia automatically handled them. Lesson learned![after implementing events with these extensions]
Once I got past that misunderstanding, finally, things were aligning, and the cross-spectrum started working properly. The alignment was correct, our filter function worked, and my mentor suggested something super smart: use the same .evt
file, filter it out, and then use both filtered versions.
This was a relief, especially because I couldn’t find matching .evt
files from different detectors. That trick really saved me. 🙌
Along the way, I started developing a secret interest in telescope observatories. Working on real telescope data made me curious about how things actually operate behind the scenes.
Finally, I submitted my biggest PR (~5.7k lines of code) —sorry, mentors 😅. It includes Cross-Spectrum implementation.n
-
Average Cross-Spectrum and single
-
Recipes for plotting
At first, I was building separate structures for the cross-spectrum and the averaged cross-spectrum. But my mentor asked: “Why not just make it one?” And then @matteobachetti suggested I could detect whether it was a cross-spectrum or an average by "m" and handle both within the same structure. That idea really helped reduce code size and made things much cleaner.
The best part? The recipes work automatically working now, which feels amazing. (Okay, I’ll admit it: I’ve used a lot of if-else
statements, but hey—it works!)
Methods and Plots:
lc1 = create_lightcurve(ev_lowE,0.01)
lc2 = create_lightcurve(ev_highE,0.01)
cs = CrossSpectrum(ev_lowE, ev_highE, 64.0, 0.01; norm="frac")
acs = AveragedCrossSpectrum(ev_lowE, ev_highE, 64.0, 0.01)
CrossSpectrum{Float64} (Single)
Frequencies: 3199
Normalization: frac
CrossSpectrum{Float64} (Averaged)
Frequencies: 3199
Segments averaged: 53
Segment size: 64.0
Mean rates: 8205.026827830188, 828.4949882075472
Normalization: frac
I am using these functions to determine whether it's an averaged or a single cross-spectrum
is_averaged(cs::CrossSpectrum) = cs.m > 1
is_single(cs::CrossSpectrum) = cs.m == 1
Now plotting:
simple plot direct cs
and acs
plot(cs)
plot(acs)
or
plot(acs, plot_type=:amplitude)
plot(cs, plot_type=:amplitude)
u can add a frequency range to:
plot(cs, plot_type=:amplitude, freq_range=(0.1, 10.0), color=:green)
Time lag spectrum
plot(cs, plot_type=:time_lag, color=:orange)
plot(acs, plot_type=:time_lag, color=:orange)
Coherence function
plot(cs, plot_type=:coherence, color=:green)
plot(acs, plot_type=:coherence, color=:green)
Coherence with confidence levels
plot(cs, Val(:coherence_confidence))
plot(acs, Val(:coherence_confidence))
Time lag with error bars and reference values.
plot(cs, Val(:lag_frequency_errors),
freq_range=(1.0, 5.0),
reference_lag=Dict(:frequency => 3.0, :expected_lag => 0.15))
plot(acs, Val(:lag_frequency_errors),
freq_range=(1.0, 5.0),
reference_lag=Dict(:frequency => 3.0, :expected_lag => 0.15))
Phase lag with error bars and reference values
plot(cs, Val(:phase_frequency_errors),
freq_range=(2.0, 5.0),
reference_phase=Dict(:frequency => 3.0, :expected_phase => π/3))
plot(acs, Val(:phase_frequency_errors),
freq_range=(2.0, 5.0),
reference_phase=Dict(:frequency => 3.0, :expected_phase => π/3))
Time lag with rebinning.
plot(cs, Val(:lag_frequency_errors), rebin_factor=4, freq_range=(1.0, 10.0))
plot(acs, Val(:lag_frequency_errors), rebin_factor=4, freq_range=(1.0, 10.0))
Compare two light curves.s
plot(lc1, lc2)
Normalization comparison (requires an array of differently normalized CrossSpectra)
cs_leahy = CrossSpectrum(lc1, lc2, norm="leahy")
cs_frac = CrossSpectrum(lc1, lc2, norm="frac")
cs_abs = CrossSpectrum(lc1, lc2, norm="abs")
plot([cs_leahy, cs_frac, cs_abs], Val(:normalization_comparison))
plot([cs_leahy, cs_frac, cs_abs], Val(:normalization_comparison),
norm_labels=["Leahy Norm", "Fractional RMS", "Absolute RMS"])
acs_leahy = AveragedCrossSpectrum(lc1, lc2,62, norm="leahy")
acs_frac = AveragedCrossSpectrum(lc1, lc2,62, norm="frac")
acs_abs = AveragedCrossSpectrum(lc1, lc2,62, norm="abs")
plot([acs_leahy, acs_frac, acs_abs], Val(:normalization_comparison))
plot([acs_leahy, acs_frac, acs_abs], Val(:normalization_comparison),
norm_labels=["Leahy Norm", "Fractional RMS", "Absolute RMS"])
Rebinning comparison (original vs rebinned)
cs_rebinned = rebin(cs, 0.25) # Your rebinning function
plot(cs, cs_rebinned, Val(:rebinning_comparison), rebin_type="Linear")
acs_rebinned = rebin(acs, 0.25) # Your rebinning function
plot(acs, acs_rebinned, Val(:rebinning_comparison), rebin_type="Linear")
Log rebinning comparison
cs_log_rebinned = rebin_log(cs, f=0.02)
plot(cs, cs_log_rebinned, Val(:rebinning_comparison), rebin_type="Logarithmic")
acs_log_rebinned = rebin_log(acs, f=0.02)
plot(acs, acs_log_rebinned, Val(:rebinning_comparison), rebin_type="Logarithmic")
noise compare
plot(cs, Val(:noise_comparison))
plot(acs, Val(:noise_comparison))
white noise
plot(cs, Val(:white_noise), high_freq_fraction=0.3)
plot(acs, Val(:white_noise), high_freq_fraction=0.3)
output: same as amplitude: Phase lag errors
plot(cs, Val(:phase_lag_errors))
plot(acs, Val(:phase_lag_errors))
significant_detections
plot(acs, Val(:significant_detections), threshold=3.0)
plot(cs, Val(:significant_detections), threshold=3.0)
What’s Next:
After GSoC, I plan to continue working on:
-
Dynamic Power Spectrum
-
Cross-Spectrumectrum improvements
-
Combining with Power Colors
I also want to become a watcher of Stingray.jl and contribute to astronomy-related open source projects long term.
One of my mentors, @stefanocovino, also suggested
Since I haven't created any docs for this, although I have many notebooks here, but they are all scattered, I will make an unstable branch which will include all the changes I have made, and then I will document it as solving puzzles
SumUps:)
Looking back, my journey was full of:
-
Struggles (debugging metadata, bad first PRs, HDU errors).
-
Learning (structs in Julia, broadcasting, telescope instruments, GTIs).
-
Wins (my first merge, lightcurve recipes, big Cross Spectrum PR).
Most importantly, it connected my childhood passion for astronomy with my programming skills.
Working with @fjebaker, @matteobachetti, and collaborating with peers like Jawd Ahmed made this journey unforgettable.
PR Number | Title / Description | Status |
---|---|---|
#65 | Cross-spectrum & recipes | Open |
#60 | Lightcurve recipes (GTI/BTI, plotting) | Open |
#59 | Power spectrum recipes | Open |
#58 | Spectra adjustments | Open |
#57 | Powerspectrum & AveragedPowerspectrum | Open |
#53 | GTI interface methods | Closed |
#52 | Good Time Interval enhancements | Closed |
#51 | Lightcurve creation & rebinning | Closed |
#49 | Mission support additions | Drafted |
#47 | `EventList` read event implementation | Closed |
#30 | Initial `readevent` groundwork | Closed |
#61 | Events feature | Closed |
Comments
Post a Comment